Conference PaperPDF Available

Lung Disease Classification using Deep Convolutional Neural Network

November 2019

November 2019

DOI:10.1109/BIBM47256.2019.8983071

Conference: IEEE International Conference on Bioinformatics and Bio-medicine 2019
At: San Diego, California

Authors:

Zeenat Tariq

University of Missouri - Kansas City

Sayed Khushal Shah

University of Missouri - Kansas City

Yugyung Lee

University of Missouri - Kansas City

The advanced technologies are essential to achieving the improvement of medicine. More specifically, an extensive investigation in a partnership among researchers, health care providers, and patients is integral to bringing precise and customized treatment strategies in taking care of various diseases. This paper aims to assess the degree of accuracy acceptable in the medical field by utilizing deep learning to publicly available data. First, we extracted spectrogram features and labels of the annotated lung sound samples and used them as an input to our 2D Convolutional Neural Network (CNN) model. Secondly, we normalized the lung sounds to remove the peak values and noise from them. For deep learning classification, publicly available data was not sufficient to conduct the learning process. Finally, we have created a deep learning model called Lung Disease Classification (LDC), combined with advanced data normaliza-tion and data augmentation techniques, for high-performance classification in lung disease diagnosis. The final accuracy obtained after the normalization and augmentation was approximately 97%. The proposed model paves the way for adequate assessment of the degree of accuracy acceptable in the medical field and guarantees better performance than other previously reported approaches.

Block Diagram for LDC System ing OST, which rescaled the features for ResNets. Dalal et al. [11] has compared four methods of machine learning approaches for the purpose of lung sound classification using lungs dataset. CNN according to their experimentation outperformed all other classifiers. However, this all depend on the batch size and number of epochs. Although they have obtained an accuracy of approximately 97% but their machine utilization was very high by applying almost 1 million or more epochs. Rupesh et al. [12] have reviewed several features extraction and classification techniques for pulmonary obstructive diseases such as COPD and asthma. In their review, the feature extraction used were FFT, STFT, spectrograms and wavelet transform. The best accuracy that was reported for CNN was approximately 95% after all possible efforts. Chen et al. [13] proposed a solution for automatic early detection of a disease using CNN for heart and lungs. They collected data from volunteer patients, which were manually annotated by doctors for the consideration of experiments. The dataset was too limited to have any consequences for results. Salamon and Bello [14] presented the data augmentation technique for environmental sound classification using CNN. The deformation of audio was performed through stretching, pitch shifting, dynamic range compression, and background noise. Piczak [15] proposed a CNN model for classification of environmental sounds. Their 1D CNN architecture consists of two convolutional rectified layers by applying max pooling, two fully connected hidden layers, and a softmax output layer. The data was augmented through random time delays and pitch shifting. Mel spectrograms were extracted from all audio samples, resampled and normalized.

…

Classification Accuracy: (a) Original, (b) Normalized, (c) Augmented

…

Figures - uploaded by Sayed Khushal Shah

Content may be subject to copyright.

Content uploaded by Sayed Khushal Shah

Content may be subject to copyright.

Lung Disease Classiﬁcation using Deep Convolutional

Neural Network

Zeenat Tariq, Sayed Khushal Shah, Yugyung Lee

School of Computing and Engineering, University of Missouri-Kansas City, USA

zt2gc@mail.umkc.edu, ssqn7@mail.umkc.edu, leeyu@umkc.edu

Abstract—The advanced technologies are essential to achieving

the improvement of medicine. More speciﬁcally, an extensive

investigation in a partnership among researchers, health care

providers, and patients is integral to bringing precise and

customized treatment strategies in taking care of various diseases.

This paper aims to assess the degree of accuracy acceptable in

the medical ﬁeld by utilizing deep learning to publicly available

data. First, we extracted spectrogram features and labels of the

annotated lung sound samples and used them as an input to our

2D Convolutional Neural Network (CNN) model. Secondly, we

normalized the lung sounds to remove the peak values and noise

from them. For deep learning classiﬁcation, publicly available

data was not sufﬁcient to conduct the learning process. Finally,

we have created a deep learning model called Lung Disease

Classiﬁcation (LDC), combined with advanced data normaliza-

tion and data augmentation techniques, for high-performance

classiﬁcation in lung disease diagnosis.

The ﬁnal accuracy obtained after the normalization and

augmentation was approximately 97%. The proposed model

paves the way for adequate assessment of the degree of accuracy

acceptable in the medical ﬁeld and guarantees better performance

than other previously reported approaches.

Index Terms—Data normalization, Data augmentation, Con-

volutional neural network, Lungs sound classiﬁcation, Deep

learning.

I. INTRODUCTION

Lung sounds are the acoustic signals generated from breath-

ing. An auscultatory method has been applied widely by

physicians to examine lung sounds associated with different

respiratory symptoms. The auscultatory method has been the

easiest way to diagnose patients with respiratory diseases such

as pneumonia, asthma, and bronchiectasis [1], [2]. However,

it is a manual process, which takes a lot of time and creates

a possibility of more or less accuracy due to the complexity

of the sound patterns and characteristics. This may involve

a high risk of missed data, leading to underdiagnosed or

misdiagnosed results [3], [4]. The accuracy of auscultation is

not always correct and reliable since it was found that in one

of the studies, the residents were not able to identify 100% of

wheezing sounds in a series of pulmonary disease sounds [5].

Machine learning plays an important role in classifying

different types of sounds through multiple algorithms [6].

Deep learning is a branch of machine learning, which has

attracted a lot of attention due to its high performance in

prediction and classiﬁcation. These learning techniques are

among the fastest-growing ﬁelds nowadays in the area of audio

classiﬁcation [7]. These classiﬁers outperform humans due to

the ability to ignore noise and memory issues.

In this paper, we have applied deep learning techniques

for better classiﬁcation of our results on the diagnosis of

respiratory symptoms. We propose our model that is uniquely

designed with a popular deep learning network, Convolutional

Neural Network (CNN). Speciﬁcally, we introduce various

advanced preprocessing techniques such as normalization and

augmentation for an effective lung sounds classiﬁcation. The

classiﬁcation is based on the spectrogram features that are

extracted from the audio dataset. The traditional classiﬁcation

results vary due to the existence of noise in the audio samples,

which are due to the environmental interference. The existing

CNN approaches have adopted a different architecture and

therefore obtaining an accuracy between 80 - 95% with very

high consumption of memory, which are purely based on audio

feature techniques. The dataset used for the experimentation

is a public dataset provided for research in [8].

One of the challenges in the research was ﬁnding the data

that is publicly available and cleaning the data that are not

recorded properly and cannot be accepted if it is given as

an input to a class. Because of directly recording audio from

lungs, the audio samples may have some noise coming from

the heart or any other sounds that exist in the body. To improve

the accuracy, we have applied the data normalization technique

on the original data to rescale the audio samples in a better

position and average values for better accuracy.

Deep learning relies on large amounts of data. Due to

limited amount of publicly available data, there is limited

research progress in this ﬁeld. To tackle this problem. we

have proposed a solution that is known in deep learning ﬁeld

as Data Augmentation [9]. Therefore, to improve our results

further, we needed larger amounts of data. For that purpose,

we have applied our data augmentation techniques, which can

help the CNN model report a better accuracy. Finally, our

model was observed to outperform all other models that are

already researched so far. Large amounts of data could not be

experimented by other researchers while data augmentation

made us stand out and outperform all other researches.

II. RE LATE D WORK

Chen et al. [10] proposed a novel solution for lung sounds

classiﬁcation by using a publicly available dataset. The dataset

was divided into three categories, i.e., wheezes, crackles and

normal. They proposed a detection method using optimized

S-transformed (OST) and deep residual networks (ResNets).

They performed preprocessing on the audio samples by us-

Fig. 1. Block Diagram for LDC System

ing OST, which rescaled the features for ResNets. Dalal et

al. [11] has compared four methods of machine learning

approaches for the purpose of lung sound classiﬁcation us-

ing lungs dataset. CNN according to their experimentation

outperformed all other classiﬁers. However, this all depend

on the batch size and number of epochs. Although they

have obtained an accuracy of approximately 97% but their

machine utilization was very high by applying almost 1 million

or more epochs. Rupesh et al. [12] have reviewed several

features extraction and classiﬁcation techniques for pulmonary

obstructive diseases such as COPD and asthma. In their review,

the feature extraction used were FFT, STFT, spectrograms

and wavelet transform. The best accuracy that was reported

for CNN was approximately 95% after all possible efforts.

Chen et al. [13] proposed a solution for automatic early

detection of a disease using CNN for heart and lungs. They

collected data from volunteer patients, which were manually

annotated by doctors for the consideration of experiments. The

dataset was too limited to have any consequences for results.

Salamon and Bello [14] presented the data augmentation

technique for environmental sound classiﬁcation using CNN.

The deformation of audio was performed through stretching,

pitch shifting, dynamic range compression, and background

noise. Piczak [15] proposed a CNN model for classiﬁcation

of environmental sounds. Their 1D CNN architecture consists

of two convolutional rectiﬁed layers by applying max pooling,

two fully connected hidden layers, and a softmax output layer.

The data was augmented through random time delays and

pitch shifting. Mel spectrograms were extracted from all audio

samples, resampled and normalized.

III. MET HO D

A. Data Normalization

In this paper, we have evaluated existing normalization

techniques and selected three best ones for the evaluation.

Root Mean Square Normalization In the Root Mean Square

(RMS) Normalization, the amplitude level takes the average

of a signal amplitude where it does not work as the arithmetic

mean of a signal received.

The RMS level is useful for ﬁnding the signal strength based

on the amplitude regardless of positive or negative values of

the signal. For a given signal, x=x1, x2, . . . , xn, the RMS

value, xrms is:

xrms =rx2

n=r1

n(x2

1+x2

2+. . . +x2

n)(1)

The signal amplitude normalization can only be possible if

we can ﬁgure out the scaling factor that can perform the linear

gain change. There is a possibility to scale a signal with an

amplitude that is higher than 1 or less than zero 0 decibels

(db). For applying the linear gain change we can rearrange

the above RMS level formula as shown in Equation 2 where

R has a linear scale.

R=r1

n[(ax1)2+ (ax2)2+. . . + (axn)2]

a=snR2

(x1)2+ (x2)2+. . . + (xn)2

(2)

Peak Normalization In peak normalization, the peak signal

level is analyzed in decibels relative to full scale (dBFS) and

for the purpose of normalization, it ampliﬁes the volume of the

signal in such a manner that the output gets 0 dB maximum.

This process can scale the amplitude of all input audio signals

in such a way that the highest amplitude of the signal has a

value of 1. The output signal based on above scaling can be

mathematically calculated as

out =1

max(abs(in)).in (3)

EBU Union Standard R128 Normalization European Broad-

casting Union (EBU) Standard R128 Normalization focused

on measuring the average loudness of a program in the

normalization of audio signals.

B. Data Augmentation

We have experimented different types of data augmentation

and concluded to experiment our results in three different ways

such as time stretching, pitch shifting, and dynamic range

compression [16]. Initially, the original data consists of 920

audio samples. After applying the augmentation techniques,

the total audio samples obtained including the original audio

samples were 11960. The ﬁles size that occupied the storage

Fig. 2. Spectrogram Feature Extraction

was 26GB. For data augmentation, it is important to select the

deformation patterns in such a way that the original labels are

maintained and augmented.

Time Stretching For augmentation, the speed of the audio

sample is changed and is increased or decreased by some

factors [17]. We used four audio speeds, i.e., 0.5, 0.7, 1.2

and 1.5, along with the original audio sample ﬁles.

Pitch Shifting For data augmentation, the pitch of the audio

samples are either decreased or increased by 4 values (semi-

tones) [18]. The duration of the audio samples is kept constant

similar to the original audio samples i.e.,4 - 10 seconds. The

value changed in semitones ranged between -2, -1, 1, 2.

Dynamic Range Compression This technique compresses the

dynamic range of the audio sample by four parameters. Among

them, three are taken from Dolby E Standard and 1 is taken

from ice cast radio live streaming server.

C. Network Model

Figure 1 shows the block diagram for the LDC system.

Data Normalization and data augmentation techniques are

applied to Lungs sound data where spectrogram features are

extracted from the regenerated audio samples. These extracted

features are passed to 2D CNN for classiﬁcation. There are

two main components of a convolutional neural network,

i.e.,feature extractor and classiﬁer. The feature extractor extract

the spectrogram features from the audio signal and pass them

to a classiﬁer to classify the signals into their appropriate

categories. The classiﬁer consists of different convolutional

and pooling layers, followed by linear activation and fully

connected layers, which are used for classiﬁcation purpose.

The mathematical form of the convolutional layers can be

found in Equation 4 and 5.

[xl

i,j,k =X

w(l−1,f)

i,j,k y(l−1)

i+a,j+b,k+c+biasf](4)

[yl

i,j,k =σ(x(l)

i,j,k)] (5)

The output layer is represented by yl

i,j,k where as the 3-

dimensional input tensor is denoted by i, j, k. The weights

for ﬁlters are denoted by w(l)

i,j,k and σ(x(l)

i,j,k)describes the

sigmoid function for linear activation. The fully connected is

layer is represented by Equation 6 and 7.

[x(l)

wl−1

i,j yl−1

j+biasl−1

j](6)

Fig. 3. Classiﬁcation Accuracy: (a) Original, (b) Normalized, (c) Augmented

[yl

i,j,k =σ(xl

i,j,k)] (7)

The 2D CNN architecture is composed of 5 layers. The ﬁrst

three are the convolutional layers, which are enclosed by max

pool layer and ﬁnally they are followed two fully connected

layers. We extracted librosa features for Mel spectrograms

because for noise data spectrograms are considered as the best

to differentiate between type of sounds. During the extraction

of features, we have used window size and hope the size of 23

ms. As the sound clips vary between 3 to 10 seconds so that

we kept the extraction to 3 seconds to make every bit of the

sound clip usable. The input from the sound clips is reshaped

and X∈R128x128 shape is provided to the classiﬁer.

The ﬁrst layer takes the reshaped features as an input in

the form of spectrograms with 24 ﬁlters. It takes the shape

of [24x1x5x5]. The stride in this layer is [4x2] with ReLU

as the activation function. The second layer has 48 ﬁlters of

the shape [48x24x5x5] with [4x2] stride max-pooling layer

and using ReLU as the activation layer. The third layer also

takes 48 ﬁlters with receptive ﬁeld [5x5] resulting in shape

[48x48x5x5], and the activation is ReLU without pooling.

Finally, the fourth layer has 64 hidden units resulting in shape

[2000x64] with ReLU activation and [64x10] with softmax

activation. In the top layer, we considered [5x5], which is a

very small receptive layer due to the localized patterns.

IV. EXPERIMENTAL DESI GN A ND RESULTS

The dataset is composed of a total 5.5 hours of recording,

which are further divided into recording samples of 126

patients. The categories include Asthma, Chronic Obstructive

Pulmonary Disease (COPD), Healthy, Upper Respiratory Tract

Infection (URTI), Lower Respiratory Tract Infection (LRTI),

and Pneumonia. . Table 1 shows the categories and the number

of data in the dataset.

We used librosa [19] for the spectrogram feature extraction.

Figure 2 shows the features extracted from the Lung Sounds

dataset for spectrograms.

TABLE I

ORIGINAL AND AUGMENTED DATA SIZE

ID Name of Disease #Audio File #Augmented Audio File

1 Asthma 1 13

2 Bronchiectasis 29 377

3 COPD 785 10205

4 Healthy 35 455

5 LRTI 2 26

6 Pneumonia 37 481

7 URTI 31 403

TABLE II

LDC SYS TE M MODE L RESU LT COMPARISON

Model Technique Accuracy

Model 1 Original Data 83%

Model 2 Peak Value Normalization 86%

Model 3 RMS Normalization 87%

Model 4 EBU Normalization 88%

Model 5 Augmentation applied on Original Data 93%

Model 6 Normalized Peak Value Augmentation 92%

Model 7 Normalized RMS Value Augmentation 94%

Model 8 Normalized EBU Value Augmentation 97%

We have designed our experiments to evaluate the proposed

lung sound classiﬁcation based on 2D CNN with the lungs

sound dataset. The dataset is split into 70% and 30% for

training/testing. The batch size was 32 and the number of

epochs was ﬁxed at 100 to avoid any over/under-ﬁtting. The

results of each instance for the LDC system is shown in Table

II. It was observed during our experimentation stage that the

highest accuracy achieved by the existing research is 97%,

which is dependent on GPU usage and memory consumption.

It can be seen from Table II that the LDC experimentation

for Models 1-8. Although the data was not enough for the

training, we were able to achieve good results.

Our model is experimented for 2D CNN classiﬁcation

network on the original dataset, which reported an accuracy

of approximately 83%. Further, we have applied the three

types of normalization i.e., Peak, RMS and EBU, and obtained

an accuracy of 86%, 87% and 88%, respectively. The data

augmentation is considered as the trend making technique in

deep learning for small datasets. Further, the accuracy reported

from the 2D CNN for the original data augmentation was

93%. We have also applied three augmentation techniques on

normalized data and the highest accuracy achieved was 97%.

Even though the data was not enough and had a lot of variation

and environmental interference in recording (i.e.,heart beat,

running fan), it was observed that our technique has achieved

a very good accuracy in comparison with the state of the art

research considering feature based approach.

Figure 3(a)-(c) represents the accuracy of 2D CNN network

based on lungs dataset for original, normalized and augmented

data. It was analyzed that when the data was in original form,

CNN ran into overﬁtting and the highest accuracy reported was

between 83%-86%. The accuracy reported for the models has

little variations, which is due to the nature of the data. After

normalization, we have noticed that the accuracy improved and

ranged between 85%-90%. Finally, by applying augmentation

we can see a visible increase in accuracy, which was reported

approximately between 96%-99%. The result obtained during

our experimentation out performs the method proposed in [11].

V. CONCLUSION

In this paper, we developed the Lung Disease Classiﬁcation

(LDC) system combined with advanced data normalization and

data augmentation techniques, for high-performance classiﬁca-

tion in lung disease diagnosis. We have obtained 97% accuracy

better than the state-of-the art accuracy. This conﬁrms that

the proposed model could be used for the diagnosis of lung

diseases with lung sounds in health care.

REFERENCES

[1] E. Pacht, J. Turner, M. Gaillun, L. Violi, D. Ralston, H. Mekhjian, and

R. John, “Effectiveness of telemedicine in the outpatient pulmonary

clinic,” Telemedicine Journal, vol. 4, no. 4, pp. 287–292, 1998.

[2] Y. Kahya, EC. Guler, and S. Sahin, “Respiratory disease diagnosis

using lung sounds,” in Proceedings of the 19th Annual International

Conference of the IEEE Engineering in Medicine and Biology So-

ciety.’Magniﬁcent Milestones and Emerging Opportunities in Medical

Engineering. IEEE, 1997, vol. 5, pp. 2051–2053.

[3] J. Kaur, K. Chugh, A. Sachdeva, and L. Satyanarayana, “Under diagnosis

of asthma in school children and its related factors,” Indian pediatrics,

vol. 44, no. 6, pp. 425, 2007.

[4] A. Mandke and K. Mandke, “Under diagnosis of copd in primary care

setting in surat, india,” 2015.

[5] S. Mangione and L. Nieman, “Pulmonary auscultatory skills during

training in internal medicine and family practice,” Am J respiratory &

critical care medicine, vol. 159, no. 4, pp. 1119–1124, 1999.

[6] J. Geiger and K. Helwani, “Improving event detection for audio

surveillance using gabor ﬁlterbank features,” in Signal Processing

Conference, 23rd European. IEEE, 2015, pp. 714–718.

[7] L. Deng, D. Yu, et al., “Deep learning: methods and applications,”

Foundations and Trends® in Signal Processing, vol. 7, no. 3–4, pp.

197–387, 2014.

[8] BM Rocha, D Filos, L Mendes, I Vogiatzis, E Perantoni, et al., “A respi-

ratory sound database for the development of automated classiﬁcation,”

in Precision Medicine Powered by pHealth and Connected Health, pp.

33–37. Springer, 2018.

[9] I. Rebai, Y. BenAyed, W. Mahdi, and JP. Lorr´

e, “Improving speech

recognition using data augmentation and acoustic model fusion,” Pro-

cedia Computer Science, vol. 112, pp. 316–322, 2017.

[10] H. Chen, X. Yuan, Z. Pei, M. Li, and J. Li, “Triple-classiﬁcation

of respiratory sounds using optimized s-transform and deep residual

networks,” IEEE Access, vol. 7, pp. 32845–32852, 2019.

[11] D. Bardou, K. Zhang, and S. Ahmad, “Lung sounds classiﬁcation using

convolutional neural networks,” Artiﬁcial intelligence in medicine, vol.

88, pp. 58–69, 2018.

[12] R. Dubey and R. M. Bodade, “A review of classiﬁcation techniques

based on neural networks for pulmonary obstructive diseases,” 2019.

[13] Q. Chen, W. Zhang, X. Tian, X. Zhang, S. Chen, and W. Lei, “Auto-

matic heart and lung sounds classiﬁcation using convolutional neural

networks,” in 2016 Asia-Paciﬁc Signal and Information Processing

Association Annual Summit and Conference. IEEE, 2016, pp. 1–4.

[14] J. Salamon, C. Jacoby, and JP. Bello, “A dataset and taxonomy for

urban sound research,” in Proceedings of the 22nd ACM international

conference on Multimedia. ACM, 2014, pp. 1041–1044.

[15] K. Piczak, “Environmental sound classiﬁcation with convolutional

neural networks,” in Machine Learning for Signal Processing, IEEE

25th International Workshop on. IEEE, 2015, pp. 1–6.

[16] LR. Aguiar, Y. Costa, and NC. Silla, “Exploring data augmentation to

improve music genre classiﬁcation with convnets,” in 2018 International

Joint Conference on Neural Networks. IEEE, 2018, pp. 1–8.

[17] S. Wei, K. Xu, D. Wang, F. Liao, H. Wang, and Q. Kong, “Sample

mixed-based data augmentation for domestic audio tagging,” arXiv

preprint arXiv:1808.03883, 2018.

[18] N. Davis and K. Suresh, “Environmental sound classiﬁcation using

deep convolutional neural networks and data augmentation,” in Recent

Advances in Intelligent Computational Systems. IEEE, 2018, pp. 41–45.

[19] B. McFee, C. Raffel, D. Liang, D. PW Ellis, M. McVicar, E. Battenberg,

and O. Nieto, “librosa: Audio and music signal analysis in python,” in

Proceedings of the 14th python in science conference, 2015, pp. 18–25.

Respiratory Diseases Diagnosis Using Audio Analysis and Artificial Intelligence: A Systematic Review

Article

Full-text available

Feb 2024
SENSORS-BASEL

Respiratory diseases represent a significant global burden, necessitating efficient diagnostic methods for timely intervention. Digital biomarkers based on audio, acoustics, and sound from the upper and lower respiratory system, as well as the voice, have emerged as valuable indicators of respiratory functionality. Recent advancements in machine learning (ML) algorithms offer promising avenues for the identification and diagnosis of respiratory diseases through the analysis and processing of such audio-based biomarkers. An ever-increasing number of studies employ ML techniques to extract meaningful information from audio biomarkers. Beyond disease identification, these studies explore diverse aspects such as the recognition of cough sounds amidst environmental noise, the analysis of respiratory sounds to detect respiratory symptoms like wheezes and crackles, as well as the analysis of the voice/speech for the evaluation of human voice abnormalities. To provide a more in-depth analysis, this review examines 75 relevant audio analysis studies across three distinct areas of concern based on respiratory diseases’ symptoms: (a) cough detection, (b) lower respiratory symptoms identification, and (c) diagnostics from the voice and speech. Furthermore, publicly available datasets commonly utilized in this domain are presented. It is observed that research trends are influenced by the pandemic, with a surge in studies on COVID-19 diagnosis, mobile data acquisition, and remote diagnosis systems.

Tunable Q-factor wavelet transform based lung signal decomposition and statistical feature extraction for effective lung disease classification

Article

Jun 2024
COMPUT BIOL MED

Modeling EfficientNet-B3 model for AI-based COVID-19 detection in chest x-rays

Conference Paper

Jan 2024

Predicting early symptoms of lung disease through the CNN classifier in .html web page

Conference Paper

Dec 2023

Deep Learning for Accurate Chest Disease Classification: A CNN-Based Approach for Lung Cancer Subtypes and Normal Cells

Conference Paper

Full-text available

Nov 2023

Enhanced deep transfer learning with multi-feature fusion for lung disease detection

Article

Full-text available

Dec 2023
MULTIMED TOOLS APPL

Early detection of lung disease is important for timely intervention and treatment, enhancing patient outcomes and decreasing healthcare cost. Chest X-rays are a widely employed imaging modality to examine the structures within the chest, including the lungs and surrounding tissues. Lung disease detection using chest X-rays is a critical application of medical imaging and artificial intelligence (AI) in healthcare. Recently, lung disease detection using deep learning (DL) becomes a significant research area, which has the potential to improve early detection rate and decrease mortality rate. Therefore, this article introduces a Multi-Feature Fusion Based Deep Transfer Learning with Enhanced Dung Beetle Optimization Algorithm (MFFTL-EDBOA) for lung disease detection and classification. The MFFTL-EDBOA technique aims to recognize the existence of lung diseases on CXR images. At the primary stage, the MFFTL-EDBOA technique uses adaptive filtering (AF) approach to remove the noise level. Besides, a multi-feature fusion-based feature extraction approach is developed based on three DL models namely DenseNet, EfficientNet, and MobileNet. For accurate lung disease detection and classification purposes, the convolutional fuzzy neural network (CFNN) approach is utilized. The hyperparameter tuning of the CFNN model occurs using the EDBOA. To illustrate the enhanced lung disease detection results of the MFFTL-EDBOA technique, a sequence of experiments is carried out on benchmark medical dataset from Kaggle repository. The experimental values highlighted the greater result of the MFFTL-EDBOA system over other recent approaches with maximum accuracy of 98.99%.

Real-time counting of wheezing events from lung sounds using deep learning algorithms: Implications for disease prediction and early intervention

Article

Full-text available

Nov 2023
PLOS ONE

This pioneering study aims to revolutionize self-symptom management and telemedicine-based remote monitoring through the development of a real-time wheeze counting algorithm. Leveraging a novel approach that includes the detailed labeling of one breathing cycle into three types: break, normal, and wheeze, this study not only identifies abnormal sounds within each breath but also captures comprehensive data on their location, duration, and relationships within entire respiratory cycles, including atypical patterns. This innovative strategy is based on a combination of a one-dimensional convolutional neural network (1D-CNN) and a long short-term memory (LSTM) network model, enabling real-time analysis of respiratory sounds. Notably, it stands out for its capacity to handle continuous data, distinguishing it from conventional lung sound classification algorithms. The study utilizes a substantial dataset consisting of 535 respiration cycles from diverse sources, including the Child Sim Lung Sound Simulator, the EMTprep Open-Source Database, Clinical Patient Records, and the ICBHI 2017 Challenge Database. Achieving a classification accuracy of 90%, the exceptional result metrics encompass the identification of each breath cycle and simultaneous detection of the abnormal sound, enabling the real-time wheeze counting of all respirations. This innovative wheeze counter holds the promise of revolutionizing research on predicting lung diseases based on long-term breathing patterns and offers applicability in clinical and non-clinical settings for on-the-go detection and remote intervention of exacerbated respiratory symptoms.

Deep learning-based lung sound analysis for intelligent stethoscope

Article

Full-text available

Sep 2023

Auscultation is crucial for the diagnosis of respiratory system diseases. However, traditional stethoscopes have inherent limitations, such as inter-listener variability and subjectivity, and they cannot record respiratory sounds for offline/retrospective diagnosis or remote prescriptions in telemedicine. The emergence of digital stethoscopes has overcome these limitations by allowing physicians to store and share respiratory sounds for consultation and education. On this basis, machine learning, particularly deep learning, enables the fully-automatic analysis of lung sounds that may pave the way for intelligent stethoscopes. This review thus aims to provide a comprehensive overview of deep learning algorithms used for lung sound analysis to emphasize the significance of artificial intelligence (AI) in this field. We focus on each component of deep learning-based lung sound analysis systems, including the task categories, public datasets, denoising methods, and, most importantly, existing deep learning methods, i.e., the state-of-the-art approaches to convert lung sounds into two-dimensional (2D) spectrograms and use convolutional neural networks for the end-to-end recognition of respiratory diseases or abnormal lung sounds. Additionally, this review highlights current challenges in this field, including the variety of devices, noise sensitivity, and poor interpretability of deep models. To address the poor reproducibility and variety of deep learning in this field, this review also provides a scalable and flexible open-source framework that aims to standardize the algorithmic workflow and provide a solid basis for replication and future extension: https://github.com/contactless-healthcare/Deep-Learning-for-Lung-Sound-Analysis .

ELPDI: A Novel Ensemble Learning Approach for Pulmonary Disease Identification

Chapter

Oct 2023

Artificial intelligence has been a revolutionary concept for the healthcare sector in recent years. Deep Neural Networks (DNNs) are subdomains of machine learning which is a vital tool for applications such as diagnostic and therapy suggestions. Pulmonary diseases significantly influence the overall well-being of numerous individuals worldwide, greatly hampering their ability to lead a healthy and balanced life. The present study uses an ensemble technique to detect Pulmonary Diseases. Here, lung sounds obtained by auscultation are transformed into spectrograms and classified using Convolutional Neural Networks (CNN) trained on various architectures. The proposed study shows an accuracy of 97.3%.

Examine Lung Disorders and Disease Classification Using Advanced CNN Approach

Conference Paper

Aug 2023

Jailsingh Bhookya

Triple-Classification of Respiratory Sounds Using Optimized S-Transform and Deep Residual Networks

Article

Full-text available

Mar 2019

Digital respiratory sounds provide valuable information for telemedicine and smart diagnosis in non-invasive way of pathological detection. As the typical continuous abnormal respiratory sound, wheeze is clinically correlated with asthma or chronic obstructive lung diseases. Meanwhile, the discontinuous adventitious crackle is clinically correlated with pneumonia, bronchitis and so on. The detection and classification of the both attract many studies for decades. However, due to the contained artifacts and constrained feature extraction methods, the reliability and accuracy of the classification of wheeze, crackle and normal sounds need significant improvement. In this paper, we propose a novel method for the identification of wheeze, crackle and normal sounds using the Optimized S-transform (OST) and deep Residual Networks (ResNet). Firstly, the raw respiratory sound is processed by the proposed OST. Then the spectrogram of OST is rescaled for the Resnet. After the feature learning and classification are fulfilled by the ResNet, the classes of respiratory sounds are recognized. Because the proposed OST highlights the features of wheeze, crackle and respiratory sounds, and the deep residual-learning generates discriminative features for better recognition, this proposed method provides a reliable access for respiratory disease related telemedicine and E-health diagnosis. Experimental results show that the proposed OST and ResNet is excellent for the multi-classification of respiratory sounds with the accuracy, sensitivity, specificity up to 98.79%, 96.27% and 100% respectively. The comparison results of the triple-classification of respiratory sounds indicate that the proposed method outperforms the deep-learning based ensembling Convolutional Neural Network (CNN) by 3.23% and the empirical mode decomposition-based Artificial Neural Network (ANN) by 4.63% respectively.

librosa: Audio and Music Signal Analysis in Python

Conference Paper

Full-text available

Jan 2015

Α Respiratory Sound Database for the Development of Automated Classification

Conference Paper

Full-text available

Nov 2017

The automatic analysis of respiratory sounds has been a field of great research interest during the last decades. Automated classification of respiratory sounds has the potential to detect abnormalities in the early stages of a respiratory dysfunction and thus enhance the effectiveness of decision making. However, the existence of a publically available large database, in which new algorithms can be implemented, evaluated, and compared, is still lacking and is vital for further developments in the field. In the context of the International Conference on Biomedical and Health Informatics (ICBHI), the first scientific challenge was organized with the main goal of developing algorithms able to characterize respiratory sound recordings derived from clinical and non-clinical environments. The database was created by two research teams in Portugal and in Greece, and it includes 920 recordings acquired from 126 subjects. A total of 6898 respiration cycles were recorded. The cycles were annotated by respiratory experts as including crackles, wheezes, a combination of them, or no adventitious respiratory sounds. The recordings were collected using heterogeneous equipment and their duration ranged from 10 to 90 s. The chest locations from which the recordings were acquired was also provided. Noise levels in some respiration cycles were high, which simulated real life conditions and made the classification process more challenging.

Improving speech recognition using data augmentation and acoustic model fusion

Article

Full-text available

Dec 2017

Deep learning based systems have greatly improved the performance in speech recognition tasks, and various deep architectures and learning methods have been developed in the last few years. Along with that, Data Augmentation (DA), which is a common strategy adopted to increase the quantity of training data, has been shown to be effective for neural network training to make invariant predictions. On the other hand, Ensemble Method (EM) approaches have received considerable attention in the machine learning community to increase the effectiveness of classifiers. Therefore, we propose in this work a new Deep Neural Network (DNN) speech recognition architecture which takes advantage from both DA and EM approaches in order to improve the prediction accuracy of the system. In this paper, we first explore an existing approach based on vocal tract length perturbation and we propose a different DA technique based on feature perturbation to create a modified training data sets. Finally, EM techniques are used to integrate the posterior probabilities produced by different DNN acoustic models trained on different data sets. Experimental results demonstrate an increase in the recognition performance of the proposed system.

A Review of Classification Techniques Based on Neural Networks for Pulmonary Obstructive Diseases

Article

Jan 2019

Sample mixed-based data augmentation for domestic audio tagging

Conference Paper

Aug 2018

Audio tagging has attracted increasing attention since last decade and has various potential applications in many fields. The objective of audio tagging is to predict the labels of an audio clip. Recently deep learning methods have been applied to audio tagging and have achieved state-of-the-art performance, which provides a poor generalization ability on new data. However due to the limited size of audio tagging data such as DCASE data, the trained models tend to result in overfitting of the network. Previous data augmentation methods such as pitch shifting, time stretching and adding background noise do not show much improvement in audio tagging. In this paper, we explore the sample mixed data augmentation for the domestic audio tagging task, including mixup, SamplePairing and extrapolation. We apply a convolutional recurrent neural network (CRNN) with attention module with log-scaled mel spectrum as a baseline system. In our experiments, we achieve an state-of-the-art of equal error rate (EER) of 0.10 on DCASE 2016 task4 dataset with mixup approach, outperforming the baseline system without data augmentation.

Environmental Sound Classification Using Deep Convolutional Neural Networks and Data Augmentation

Conference Paper

Dec 2018

Exploring Data Augmentation to Improve Music Genre Classification with ConvNets

Conference Paper

Jul 2018

Lung sounds classification using convolutional neural networks

Article

May 2018
ARTIF INTELL MED

Lung sounds convey relevant information related to pulmonary disorders, and to evaluate patients with pulmonary conditions, the physician or the doctor uses the traditional auscultation technique. However, this technique suffers from limitations. For example, if the physician is not well trained, this may lead to a wrong diagnosis. Moreover, lung sounds are non-stationary, complicating the tasks of analysis, recognition, and distinction. This is why developing automatic recognition systems can help to deal with these limitations. In this paper, we compare three machine learning approaches for lung sounds classification. The first two approaches are based on the extraction of a set of handcrafted features trained by three different classifiers (support vector machines, k-nearest neighbor, and Gaussian mixture models) while the third approach is based on the design of convolutional neural networks (CNN). In the first approach, we extracted the 12 MFCC coefficients from the audio files then calculated six MFCCs statistics. We also experimented normalization using zero mean and unity variance to enhance accuracy. In the second approach, the local binary pattern (LBP) features are extracted from the visual representation of the audio files (spectrograms). The features are normalized using whitening. The dataset used in this work consists of seven classes (normal, coarse crackle, fine crackle, monophonic wheeze, polyphonic wheeze, squawk, and stridor). We have also experimentally tested dataset augmentation techniques on the spectrograms to enhance the ultimate accuracy of the CNN. The results show that CNN outperformed the handcrafted feature based classifiers.

Automatic heart and lung sounds classification using convolutional neural networks

Conference Paper

Dec 2016

We study the effectiveness of using convolutional neural networks (CNNs) to automatically detect abnormal heart and lung sounds and classify them into different classes in this paper. Heart and respiratory diseases have been affecting humankind for a long time. An effective and automatic diagnostic method is highly attractive since it can help discover potential threat at the early stage, even at home without a professional doctor. We collected a data set containing normal and abnormal heart and lung sounds. These sounds were then annotated by professional doctors. CNNs based systems were implemented to automatically classify the heart sounds into one of the seven categories: normal, bruit de galop, mitral inadequacy, mitral stenosis, interventricular septal defect (IVSD), aortic incompetence, aorta stenosis, and the lung sounds into one of the three categories: normal, moist rales, wheezing rale.

Lung Disease Classification using Deep Convolutional Neural Network

Abstract and Figures

Recommended publications

Lung Disease Classification using Deep Convolutional Neural Network

IoT based Urban Noise Monitoring in Deep Learning using Historical Reports

Automatic Multimodal Heart Disease Classification using Phonocardiogram Signal

Speech Emotion Detection using IoT based Deep Learning for Health Care