ArticlePDF Available

Comparative analysis of Machine Learning approaches for early stage Cervical Spondylosis detection

August 2020
Journal of King Saud University - Computer and Information Sciences 34(6)

August 2020
34(6)

DOI:10.1016/j.jksuci.2020.08.010

License
CC BY-NC-ND 4.0

Authors:

Sreeraj M.s

Cochin University of Science and Technology

Jestin Joy

St. George's College Aruvithura

Show all 5 authorsHide

Cervical Spondylosis(CS) is a chronic spinal condition in which the spine gradually stiffens and can finally become completely inflexible. It is arduous to diagnose in early stages and leads to delay in medication. The risk level of cervical spondylosis can be reduced if it is detected in primary care. Based on this objective, a system is designed and developed to diagnose and predict the severity of cervical spondylosis in early stages. Different machine learning techniques are evaluated for this and results indicate that machine learning techniques can provide a low cost and accurate mechanism for early stage spondylosis detection.

Block diagram of the proposed system

…

Reduced average-stage2

…

Keypoints Labeling

…

Performance Evaluation

…

Performance Metrics

…

Figures - available via license: Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International

Content may be subject to copyright.

Available via license: CC BY-NC-ND 4.0

Content may be subject to copyright.

Journal Pre-proofs

Comparative analysis of Machine Learning approaches for early stage Cervi‐

cal Spondylosis detection

M. Sreeraj, Jestin Joy, Manu Jose, Meenu Varghese, T.J. Rejoice

PII: S1319-1578(20)30448-1

DOI: https://doi.org/10.1016/j.jksuci.2020.08.010

Reference: JKSUCI 832

To appear in: Journal of King Saud University - Computer and

Information Sciences

Received Date: 17 April 2020

Accepted Date: 19 August 2020

Please cite this article as: Sreeraj, M., Joy, J., Jose, M., Varghese, M., Rejoice, T.J., Comparative analysis of

Machine Learning approaches for early stage Cervical Spondylosis detection, Journal of King Saud University -

Computer and Information Sciences (2020), doi: https://doi.org/10.1016/j.jksuci.2020.08.010

This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover

page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version

will undergo additional copyediting, typesetting and review before it is published in its final form, but we are

providing this version to give early visibility of the article. Please note that, during the production process, errors

may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Comparative analysis of Machine Learning approaches

for early stage Cervical Spondylosis detection

Abstract

Cervical Spondylosis(CS) is a chronic spinal condition in which the spine grad-

ually stiﬀens and can ﬁnally become completely inﬂexible. It is arduous to

diagnose in early stages and leads to delay in medication. The risk level of cer-

vical spondylosis can be reduced if it is detected in primary care. Based on this

objective, a system is designed and developed to diagnose and predict the sever-

ity of cervical spondylosis in early stages. Diﬀerent machine learning techniques

are evaluated for this and results indicate that machine learning techniques can

provide a low cost and accurate mechanism for early stage spondylosis detection.

Keywords: Cervical Spondylosis(CS), CNN, ORB, Sitting posture, Machine

Learning.

1. Introduction1

A common cause of spinal cord dysfunction in elderly persons is Cervical2

Spondylosis (CS). Lifestyle, age, occupation and many other factors can be the3

reason for rapid growth of spine related diseases. Early detection followed by4

medication can help to reduce the risk level of the disease. Technology to detect5

cervical spondylosis in early stages is nonexistent, since it is very diﬃcult to6

diagnose in primary stage.7

A chronic deterioration of the vertebrae and discs of the neck results in8

Cervical Spondylosis(CS). The bony projections that are seen along joints are9

referred to as bone spurs or osteophytes. The disease is also often linked with10

arthritis[1, 2, 3]. In cervical spondylosis, the degree of pain can diﬀer from11

individual to individual. By practising a healthy lifestyle, one can reduce the12

risk of CS.13

According to the recent study[4] by the Oﬃce for National Statistics[5], the14

count of people with lower back pain and neck pain is growing everyday. This15

aﬀects their ability to do work. From reports, around 31% men and 20% of16

women are facing the issue. Lower back pain is a problem faced by employees17

who sit for long hours in a particular posture. This reduces the utility of their18

ability to work eﬃciently. Timely detection of such bad postures which cause19

lower back pain helps to diagnose and treat the problem. Recent studies proved20

that sitting habits have great signiﬁcance in physical health. There are diﬀerent21

measuring tools used to evaluate neck pain caused by cervical spondylosis. But22

Preprint submitted to Elsevier August 8, 2020

these systems take more time to detect and have less accuracy. This calls for23

a better solution to detect and predict the risk level of cervical spondylosis. A24

low cost, fast and eﬃcient system is proposed in this paper.25

The system presents a detection method using data from a camera. The26

overall working of the system is as follows. A camera is placed either to the27

right or left side of the person, and it continuously takes video of the person28

sitting in a chair. The camera takes the video around 3-4 hours and splits the29

video into frames. After splitting the frames, similarity between the images30

are calculated. If a frame occurs more than 60%, that frame will be selected3 1

for further processing. The selected frame is tested using the machine learning32

model. The images are classiﬁed into four diﬀerent classes - normal, mild,33

average and high risk levels. Key points are identiﬁed and calculated and that34

of cervical spine and lower back are extracted. Angles calculated from these are35

used for classiﬁcation.36

The proposed system can be used by general public to ﬁnd out the chances of37

getting cervical spondylosis in early stages by inputting a video of their sitting38

posture. The system predicts the level of risk involved by making use of machine39

learning algorithms. Diﬀerent algorithms based on shallow learning and deep40

learning techniques are evaluated for this purpose.41

2. Literature Review42

Cervical Spondylosis (CS) is usually asymptomatic, but may be present with43

symptoms like neck pain, stiﬀness or even shoulder pain and severe joint pain of44

arms as well as tactile sensations. This points to age-related disc degenerative45

chronic CS. Neck pain is one of the most common issues among patients with46

CS. Clinical examination, spinal angiography, X-ray, MRI scan and computer47

tomography[6] are the present diagnostic methods used. Among these methods,48

x-ray is widely used, because it is cost eﬀective and has low radiation. But49

the main issue with x-ray evaluation is the low accuracy. It depends on the50

experience and knowledge of the clinicians[7]. So diﬀerent clinicians may give51

diﬀerent clinical reports based on the x-ray analysis, and it is diﬃcult to diagnose52

the risk level of CS.53

MRI images can be used to diagnose CS. MRI is diﬀerent from x-ray because54

it does not cause radiation. The computer gives cross-sectional binary images of55

the body that are converted into three-dimensional (3-D) images of the scanned56

area. This helps to pinpoint problems in the cervical spine when the scan focuses57

on that area. MRI images show the details of soft tissues such as cartilages and58

nerve roots. This test can show spinal compressions more clearly than x-rays.An59

MRI scan of the cervical spine is used only if the pain persists even after normal60

treatment. The only drawback with the MRI is that it takes around 30-4561

minutes. Other issues are some preconditions. Before performing an MRI scan,62

the doctor must ensure that the person is not diabetic, does not have kidney63

problems, or not in the ﬁrst trimester of pregnancy. In these cases the doctor64

should take extra care[8].65

A cervical spine CT scan[9] is a medical tool for developing a visual image of66

the cervical spine using advanced X-ray equipment and computer imaging. The67

cervical spine is the part of the spine passing across your neck. Despite this,68

the examination is often called a neck CT scan. When a person has important69

information Via testing bone density, certain bone disorders, such as arthritis or70

CS, will help a doctor assess the severity of the disorder and classify it. A normal71

X-ray brings a small amount of radiation into the body of the patient. Bones72

and soft tissue absorbs radiation diﬀerently, so that they appear on the X-ray73

ﬁlm in various colours. A CT scan works similarly but many X-rays are taken74

in a spiral fashion instead of one ﬂat image. It provides more information and75

accuracy. When the patient is inside the scanner, several X-ray beams pass in76

a circular motion across the upper torso and neck as electronic X-ray detectors77

monitor the radiation that the body absorbs. This information is interpreted by78

a computer to produce separate images, called slices. These are then combined79

to create a cervical spine 3-D model.80

Computer technology has its advantages such as early stage diagnosis, less81

time and eﬀort, cost eﬀectiveness, higher eﬃciency and accuracy. Machine82

Learning based algorithms[10] for detecting Cervical Spondylosis are discussed83

in literature. Most of the methods use MRI images for classiﬁcation.84

Kei Hirano et.al[11] proposed a novel approach for quantitative analysis of85

the relationships of elderly patients and consumer goods. The authors have86

introduced three functions: robust Pose estimation, standardization and clus-87

tering process. It speciﬁcally supports elderly people with physical and cognitive88

ability impairment (e.g. dementia). Mikel Ariz et.al[12] proposed a method for89

head pose estimation using 2D tracking of the face and also enhancing 2D point90

tracking and 3D pose estimation. The baseline form for pose estimation is ex-91

ploited and a novel weighted variant of POSIT algorithm is proposed in this92

work.93

Eduardo Ramirez et.al[13] proposed a hybrid model as classiﬁcation method94

for 2-lead cardiac arrhythmias. Artiﬁcial neural networks and fuzzy logic is used95

to develop this system. Ivette Miramontes et.al[14] describe optimal design of96

type-1 and interval type-2 fuzzy systems. Type-1 fuzzy systems are designed97

and optimized with trapezoidal membership function and second by Gaussian98

membership function. Crow search algorithm and Bird swan algorithm were99

compared for performance comparison. P. Melin, I. Miramontes et.al[15] pro-100

posed a hybrid model using modular networks and a fuzzy system developed for101

the hypertension risk diagnosis. The modular network shows a learning accu-102

racy of 98%, 97.62% and 97.83% in ﬁrst, second and third modules respectively.103

O. Castillo, P. Melin et.al[16] explained about the hybrid intelligent system for104

arrhythmia classiﬁcation. They tried the combination of fuzzy KNN with neural105

network with Mamdani fuzzy system. The methods used for classiﬁcation were106

Fuzzy K-Nearest Neighbors, Multi Layer Perceptron with Gradient Descent and107

momentum and Multi Layer Perceptron with Scaled Conjugate Gradient Back-108

propagation. 98% accuracy was obtained using Mamdani type fuzzy inference109

system.110

3. Design and Implementation111

The proposed system is a patient assistive system that can be used to detect112

the onset of cervical spondylosis in early stages. Both shallow learning and deep113

learning techniques are evaluated for this.114

The captured videos of the individual is used to ﬁnd out the115

chances of getting cervical spondylosis. 72 volunteers were recruited116

for collecting data. They were divided into four classes containing 18117

persons each. Individuals in each of these four classes were asked to118

sit in the prescribed position - normal, mild, average and high, which119

denotes the chances of getting cervical spondylosis. Their positions120

were evaluated by a medical practitioner who is an expert in the ﬁeld121

of cervical spondylosis. From these four classes, a 6-fold cross valida-122

tion method was used for evaluation. That is 83.33% (60 videos) for123

training purpose and the remaining 16.66%(12 videos) for testing.124

The block diagram of the proposed system is shown in Figure 1.125

Figure 1: Block diagram of the proposed system

The main objective of the ﬁrst method is to ﬁnd the best frame- the frame126

which has the capability of giving the sitting position of a person- based on127

the time duration a person sits in a speciﬁc posture. To obtain the best frame,128

video is captured and is split into diﬀerent frames that contain the key frames.129

Diﬀerent methods like Mean Squared Error Method (MSE), Reduced Average130

Method and Keypoint Descriptor Method are evaluated for this. Best perfor-131

mance is obtained by Keypoint Descriptor Methods such as SIFT[17], SURF[18]132

and ORB[19].133

Keypoints are detected from Part Aﬃnity Fields (PAFs)[20] representation.134

It helps to associate body parts with human image. This method can easily135

identify key points in hand, foot and body.136

Cervical Spondylosis has four diﬀerent stages, from normal to severe. Based137

on these risk levels, diﬀerent target objects are created based on the angles cal-138

culated according to the law of cosine using key points obtained from previous139

stages. In shallow learning method, K-NN and SVM classiﬁers are studied. In140

Deep Learning, YOLO and CNN are considered. In YOLO, the target object141

detection is done on every frame of the video and also calculates the duration of142

the continuous time spent on a particular position. In CNN based approach, ev-143

ery frame is passed to ﬁnd out the key points and the classiﬁcation is performed144

on that.145

3.1. Mean Squared Error Method(MSE)146

MSE[21] is used to select the most repeating frame in a particular interval147

of time. At ﬁrst, the starting frame is selected as the pivot element. If the148

person is sitting, it is assumed that the sitting position angle is 90°, and when149

the sitting angle changes to 60°- 65°an error occurs in frames. So the MSE150

between the ﬁrst frame and second frame is computed. If there is no error or151

a small error is found, then the ﬁrst frame and the third frame are checked,152

calculated and this process continues. When a major change occurs in frames153

then that particular frame is set as the pivot element.154

MSE =1

i=1

(yi−˜y)2(1)

Likewise, the entire frames are subjected to MSE computation. Based on this155

error value, a pattern is obtained. This is shown in Figure 2 The main issue of156

this method is that it is sensitive to outliers. The disadvantage of MSE can be157

overcome by using Minimum Mean Square Error (MMSE) or Reduced Average158

Method (RAM).159

3.2. Reduced Average Method160

The key frames obtained using MSE are also not accurate, as MSE is sensitive161

to outliers. It contains frames with both very high error value and very low162

error value. By using Reduced Average Method these frames can be avoided.163

After avoiding these frames, a set of frames with a range of error values can be164

obtained. For getting a feasible solution, the average is computed. Based on165

these values, the reduced average stage-1 is illustrated in Figure 3.166

In this technique the issue is an outlier with repeated patterns .Thus it is167

diﬃcult to ﬁnd out the most repeating frame from the pattern.168

Figure 2: Pattern generated after implementing MSE

Figure 3: Reduced average-stage1

From the illustration in Figure 3, it is observed that there exists some peak169

values which can be avoided through normalization. By normalization the num-170

ber of frames are calculated using maximum error and minimum error. The171

errors are distributed into diﬀerent numbers of frames as follows:172

No.off rames =M ax.error −Min.error

100 (2)

From the frames, the key frame is selected which has the highest frequency173

of error. The most repeating frames have a similar range of errors and are174

illustrated in Figure 4 which gives a feasible solution but not optimal. The175

frames are selected and subjected to further processing.

Figure 4: Reduced average-stage2

176

3.3. Keypoint Descriptor Methods177

This method considers dissimilar frames for computation. There are diﬀerent178

keypoint descriptor methods like SIFT[17], SURF[18] and ORB[19]. Our studies179

have found that this is the most feasible method for ﬁnding the similarities180

between the frames.181

3.3.1. ORB182

Oriented FAST and Rotated BRIEF (ORB)[19] is a fusion of FAST key-183

point detector and BRIEF descriptor with many modiﬁcations to enhance the184

performance. Top N points of the keypoints are obtained by applying the Harris185

corner measure upon the keypoints found through FAST which results in multi-186

scale features. The limitation of FAST is that it doesn’t compute orientation.187

To improve the rotation invariance, moments are computed with xand ywhich188

should be in a circular region of radius r, where ris the size of the patch. ORB189

feature is used to ﬁnd the similarity between the images. First the video is split190

into frames. These frames are subjected to ORB to ﬁnd the similar images.191

The reason for ﬁnding the similar images is that,when a person is sitting for a192

long time in a similar posture, it is probably the comfortable posture for that193

person.194

3.3.2. SIFT195

The SIFT[17, 22] algorithm mainly consists of four steps. In the ﬁrst step,196

the Diﬀerence of Gaussian (DoG) method is used to estimate a scale space197

extreme followed by a key point localization which is performed in the second198

step. A reﬁnement is also done in the same scene to eliminate the low contrast199

points. In the third step, the key point orientation is done upon local image200

gradients. A descriptor generator is computed in the ﬁnal step which is the201

local image descriptor for each generated key point. The descriptor generated202

is a function of the gradient magnitude and orientation.203

3.3.3. SURF204

In SURF[18] technique, the Diﬀerence of Gaussian (DoG) is approximated205

with the aid of box ﬁlters. The technique uses squares for approximating, as206

the convolution is faster while using integral images. This process can also be207

approached in a parallel manner.208

3.4. Keypoint detection209

Keypoint detection for ﬁnding the features(angle deviation corresponding to210

hip and neck) with respect to the sitting position of the person in key frame is211

described in Algorithm 1.212

Keypoints labelling corresponding to the human is described in the Table 1.213

Figure 5: Keypoints Labeling

Data: Keyframes

Result: Angles corresponding to hip and neck

while not at end of this frame do

Step 1:Preprocessing stage:;

Convert the image from [0, 255] to [-1, 1]

img = img * (2.0 / 255.0) — 1.0;

Step 2 :Pass the image through Neural Network.The output for this is a heat map

matrix and a Part Aﬃnity Fields matrix;

while not at end of connection do

Step 3: Non Maximum Suppression: Here we have to detect the parts in the

image.For that extract parts locations out of a heatmap such as ‘The local

maximums’.For Local maximus apply a non-maximum suppression (NMS)

algorithm;

3.1 Launch the heatmap at ﬁrst pixel.;

3.2 Cover the pixel with a side 5 window to locate the max value in the area.;

3.3 Substitute the center-pixel value with the maximum;

3.4 Move/Stride the window one pixel and and repeat the steps until we have

ﬁlled the entire heatmap.;

3.5 The output is contrasted with the initial heatmap. Those pixels that

remain the same value are the points that we’re searching for. Replace all

pixels bringing them to a value of 0.;

Step 4:Generate complete bipartite graph;

Step 5:Applies Line Integral;

Step 6:Generates a bipartite weighted graph;

Step 7:Implement Assignment Algorithm;

7.1 Filter by the score of each potential connection;

7.2 The connection to the highest score is a conclusive connection indeed;

7.3 Going to contact as fast as possible. If no portions of this connection were

previously attributed to a ﬁnal connection that is a ﬁnal connection;

7.4 Repeat step 3, until ﬁnished.

end

Step 8:Merging : transform these detected connections into the ﬁnal skeletons;

Step 9:key points are identiﬁed;

Step 10:two(neck & hip) of them were selected;

Step 11:Angle were calculated according to law of cosines;

Step 12:Angle is converted to vector along with each frame information;

end

Algorithm 1: Angle ﬁndings through open pose

Table 1: Keypoints Labeling

Keypoint Body part

0 Nose

1 Neck

2 Right Shoulder

3 Right Elbow

4 Right Wrist

5 Left Shoulder

6 Left Elbow

7 Left Wrist

8 Right Hip

9 Right Knee

10 Right Ankle

11 Left Hip

12 Left Knee

13 Left Ankle

3.5. Classiﬁcation214

After keypoints are detected, it is fed to a classiﬁcation algorithm to ﬁnd the215

problematic frame and evaluated using diﬀerent machine learning algorithms.216

K-NN and SVM[23] Classiﬁers were used to classify frames. K-NN is based on217

feature similarity. The SVM classiﬁer works on a wide range of classiﬁcation218

problems which are high dimensional in nature. However, SVM requires ﬁne219

tuning of the key parameters to achieve good accuracy in classiﬁcation.220

3.5.1. K-NN Classiﬁcation221

K-nearest Neighbors[24] is a lazy algorithm that stores all instances and clas-222

siﬁes unknown instances based on a similarity measure. KNN has been widely223

used in pattern recognition and estimation problems. KNN makes predictions224

for a new instance xby searching through the entire stored instances for the225

K-most similar instances and assigns the same to the unknown instance. The226

similarity is measured through diﬀerent distance measures like Euclidean dis-227

tance and Manhattan distance.228

Euclidean =v

i=1

(xi−yi)2(3)

229

Manhattan =v

i=1

|xi−yi|(4)

3.5.2. SVM Classiﬁer230

A Support Vector Machine (SVM)[25] constructs a hyperplane or set of231

high or inﬁnite-dimensional spaces that can be used for classiﬁcation or regres-232

sion. Intuitively, the hyperplane which has the greatest distance to the closest233

training data point of any class (so-called functional margin) achieves a strong234

separation, because in general the greater the margin the lower the classiﬁer’s235

generalization error. SVM uses kernel methods to address non-linear problems.236

A kernel method is an algorithm which only relies on the data by dot-products.237

If this is the case, a kernel function which calculates a dot product in some prob-238

ably high-dimensional feature space will replace the dot product. The kernel239

methods help to generate non-linear decision boundaries using linear classiﬁer240

methods. They also help the user to add data that does not have a clear illus-241

tration of the ﬁxed-dimensional vector space.242

Gaussian, K(x, y) = exp(−||x−y||2

2σ2) (5)

3.5.3. CNN based classiﬁcation243

CNN[26] is the most popular deep learning architecture. Our study has found244

that CNN can easily classify keypoint frames as compared to KNN and SVM.245

Every node of the neural network has their own sphere of knowledge about rules246

and functionalities to develop itself through experiences learned from previous247

techniques.248

The CNN architecture used is shown in Figure 6. The CNN architecture249

consists of a total 23 layers. Out of it 10 are convolutional layers. 4 layers are250

maxpool layers, 4 fully connected layers, 2 batch normalization layers, 2 ReLu251

layers, and one tanh layer. The ﬁrst and second layer is a convolutional layer252

with 64 kernel ﬁlters of 3×3 size. Followed by a max pooling layer. Then the253

fourth and ﬁfth layer is again a convolutional layer with 128 kernel ﬁlters of254

3×3 size. Sixth layer is a max pooling layer. Repeatedly seventh and eighth be-255

come convolutional layers with 256 kernel ﬁlters and 3×3 size followed by a max256

pooling layer. Then continuously upto four layers, convolutional layers with 512257

kernel ﬁlters and 3×3 size followed by a max pooling layer. Then the next two258

layers explain the fully connected (FC)layers with a size of 4096. Layers seven-259

teen and twenty indicate the batch normalization layer which splits data into260

batches.Layers eighteen and twenty one refer to ReLU(Rectiﬁer Linear Unit),261

activation function f(x) = max(0, x) for the network. The layers nineteen and262

twenty two contain FC layers with 4096x500 size and 4096x3 size respectively.263

The last layer consists of a non-linear function called tanh.tanh(x) = 2σ(2x)−1.264

3.5.4. YOLO265

YOLO[27, 28] uses CNN for doing object detection in real-time. In YOLO266

a single neural network is applied to a full image, and then divides the same267

into regions and predicts bounding boxes with its conﬁdence. Based on the risk268

levels of cervical spondylosis diﬀerent target objects were created based on the269

angles calculated according to the law of cosine using open pose library (neck270

and hip position). In YOLO, the target object detection is done on every frame271

of the video and also calculates the duration of the continuous time spent in a272

particular position.273

Figure 6: CNN-Architecture

The YOLO architecture contains 23 convolutional layers, 5 max pooling lay-274

ers, two route layers, a reorg layer and a detection layer. The layers start with275

a convolutional layer with 32 kernel ﬁlters and the size is 3x3/1 followed by a276

max pooling layer with size 2x2/1. Then convolutional layer and max pooling277

layers are repeated with 64 kernel ﬁlters and the size is 3x3/1 and 2x2/1 respec-278

tively.Next three layers are convolutional layers with kernel ﬁlters 128,64 and279

128 and size with 3x3/1,1x1/1 and 3x3/1 respectively followed by a max pooling280

layer with 2x2/1 size. Again, next three layers consists of convolutional layers281

with 256,128 & 256 kernel ﬁlters and with size of 3x3/1,1x1/1 and 3x3/1 respec-282

tively and a max pooling layer of 2×2/1 size. Next ﬁve layers are convolutional283

layers, while alternative layers have kernel ﬁlters 512 and 256 respectively and284

also size 3x3/1 and 1x1/1. The last max pooling layer has 2x2/1 size followed285

by four convolutional layers with alternative kernel ﬁlter of 1024 & 512 and also286

3x3/1 & 1x1/1 respectively. The next three layers are also convolutional layers287

with kernel ﬁlter 1024 and size with 3x3/1 followed by one of the route lay-288

ers which indicate the action of concatenation. This layer has 16 kernel ﬁlters.289

Twenty sixth layer is a convolutional layer with 64 kernel ﬁlters and the size290

is 1x1/1. Next is a reorg layer which reshapes the feature map and decreases291

size and increases the number of channels without changing elements. This is292

succeeded by the second route layer and third last and second last layers. These293

are convolutional layers with kernel ﬁlter 1024 & 40 and with size 3x3/1 & 1x1/1294

respectively. The last layer of the architecture is known as the detection layer.295

3.6. Implementation296

The developed system collects data by using a web camera. The camera297

is placed either to the left side or right side of the person. Posture image is298

captured continuously(upto 3-4 hours). Frame rate of 17 frames per second is299

used. After recording the video, it is split into frames. The dimension of each300

frame is 640 x 480 pixels.301

The ﬁrst phase is to ﬁnd out the keyframe and time duration of the re-302

spective changed frames in a captured video. Keyframe is a frame which has303

changes with respect to the previous frame and time duration is the time taken304

to eﬀect the changes in frames. For a particular time period the subject in the305

captured video is sitting straight and after a certain time the subject changes306

his/her sitting position to leaning position, and then calculates the time dura-307

tion between the sitting frame and the changed leaning frame and also identiﬁes308

the key frame. From the three diﬀerent image matching techniques, such as309

Mean Squared Error Method(MSE), Reduced Average and keypoint descriptor310

method, the keypoint descriptor method gives a more accurate result as it con-311

siders dissimilar frames. After comparing with SIFT, SURF and OR; ORB gets312

keypoint more eﬃciently than others. The key frames identiﬁed are used as the313

input to openpose library for post processing.314

OpenPose[20] represents the ﬁrst real-time multi person system on single315

images with a total of 135 key points corresponding to the human body, hand,316

facial, and foot keypoints. Among 135 keypoints, we consider only the key317

points of neck and hip(spine) positions in this study. By using the key points318

in these positions, lean angles are computed by law of cosines.Then the images319

are tested against the model. For testing, the two angles and the time factors320

are considered.321

For classiﬁcation, posture position, duration of the posture and continuity of322

the posture for a period is collected. Based on this information, posture classi-323

ﬁcation is performed to calculate the various risk levels of Cervical Spondylosis.324

For the classiﬁcation task, CNN based method is shown to have better perfor-325

mance.326

4. Result Analysis327

4.1. Performance of diﬀerent descriptors to obtain the best frame328

Table 2 lists performance evaluation of diﬀerent keypoint detection tech-329

niques. Performance was evaluated through the eﬃciency of each algorithm for330

images with varying intensity values and augmentation of images such as rota-331

tion,scaling and sheared image. Each case is evaluated through time needed to332

extract each keypoint descriptor, number of keypoints detected in subsequent333

images/frames, number of matches and average matching rate. Overall match334

rate of ORB outperform than SURF and SIFT. Computational time require-335

ment for ORB keydescriptor is less in all the cases. Image augmentation with336

rotation ORB provides the best results rather than other two algorithms. When337

scaling, the image has been scaled twice to show the eﬀect of matching the scal-338

ing value. The highest average match score is for SIFT and the lowest for ORB.339

The original image was sheared with value 0.5 and ORB has the highest overall340

match score.341

Table 2: Performance Evaluation

ORB SURF SIFT

Images with various intensity

Time (Sec) 0.03 0.04 0.13

Keypoint detected in ﬁrst image 248 162 261

Keypoint detected in second image 229 166 267

No.of mates 183 119 168

Average match rate (%) 76.7 72.6 63.6

Image with its rotated image

Time(Sec) 0.03 0.03 0.16

Keypoint detected in ﬁrst image 248 162 261

Keypoint detected in second image 260 271 423

No.of mates 166 110 158

Average match rate(%) 65.4 50.8 46.2

Image with its scaled image

Time(Sec) 0.02 0.08 0.25

Keypoint detected in ﬁrst image 248 162 261

Keypoint detected in second image 1210 581 471

No.of mates 232 136 181

Average match rate(%) 31.8 36.6 49.5

Image with its sheared image

Time(Sec) 0.026 0.049 0.133

Keypoint detected in ﬁrst image 298 162 261

Keypoint detected in second image 229 214 298

No.of mates 150 111 145

Average match rate(%) 62.89 59.04 51.88

4.2. Performance of diﬀerent classiﬁers342

Classiﬁcation is evaluated using diﬀerent evaluation metrics. Accuracy is343

the most commonly used metrics. Other criteria include precision, sensitiv-344

ity, accuracy, False Negative Rate (FNR), False Positive Rate (FPR) and F1345

score.Precision or PPV (Positive Predictive Value) tests the positive ones cor-346

rectly identiﬁed within the positive type samples. False Negative Rate (FNR)347

is deﬁned as the ratio of the number of negative samples wrongly reported (i.e.,348

false negatives) to the total number of positive samples actually given. Error349

rate is calculated as the ratio of the number of incorrect classiﬁcations to the350

amount of test data being evaluated and it is therefore simple to quantify with351

less diﬃculty. Speciﬁcity or TNR(True Negative Rate) is a calculation of the352

number of correctly classiﬁed negatives, while Sensitivity or TPR (True Positive353

Rate) or Recall measures the number of correctly classiﬁed positives. It refers to354

the conditional probability of accurately determining illness through a diagnos-355

tic test.The F1-score is given by the harmonic mean of sensitivity and precision356

values.The False Positive Rate (FPR) is calculated as the ratio of the number357

of wrongly reported positive tests (i.e., false positives) to the total number of358

real negatives. A classiﬁer with greater precision, accuracy, speciﬁcity ,sensi-359

tivity, NPV and F1-score is considered to be more eﬃcient. These metrics are360

illustrated in the table 3.361

Table 3: Evaluation Metrics

Metrics Formula

Sensitivity or recall TPR = TP / (TP + FN)

Speciﬁcity SPC = TN / (FP + TN)

Precision PPV = TP / (TP + FP)

False Positive Rate FPR = FP / (FP + TN)

False Discovery Rate FDR = FP / (FP + TP)

False Negative Rate FNR = FN / (FN + TP)

Accuracy ACC = (TP + TN) / (P + N)

F1 Score F1 = 2TP / (2TP + FP + FN)

4.2.1. KNN362

Performance of the KNN is measured through two diﬀerent distance mea-363

sures such as Euclidean and Manhattan and is described in the Table 4.This364

result is obtained through the value of k=7 and observed that both distance365

gives very near result for recall ie,0.7000 for Euclidean and 0.6977 for Manhat-366

tan distance. While Euclidean distance has better performance in accuracy and367

precision.368

Table 4: KNN Performance Evaluation

Accuracy Precision Recall

Euclidean distance 0.7115 0.7778 0.7000

Manhattan distance 0.6982 0.7059 0.6977

4.2.2. SVM369

Two diﬀerent kernels were tried with SVM and observed that Gaussian has370

better accuracy of 0.8039 and recall of 0.8400 than RBF kernel. While RBF has371

slight improvement in precision of 0.7895 than Gaussian kernel of 0.7778.372

Table 5: SVM Performance Evaluation

Accuracy Precision Recall

Gaussian Kernel 0.8039 0.7778 0.8400

RBF Kernel 0.7867 0.7895 0.7895

4.2.3. Deep learning373

In deep learning we compared the performance of CNN with YOLO al-374

gorithm.The performance was evaluated through two stages such as with and375

without hyper parameter tuning and is shown in table 6. YOLO has narrow376

diﬀerence of better result in precision of 83.09% when we compared with the377

CNN of 82.77% in without hyper parameter tuning. While in case of with hyper378

parameter tuning, the CNN has remarkable results compared with YOLO. And379

this result is obtained at an epoch of 6000 and is illustrated in the ﬁgure 7.380

Table 6: Performance Evaluation

Accuracy Precision Recall

With hyper-parameter

CNN 0.8800 0.9200 0.8519

Yolo 0.8700 0.9000 0.8491

Without hyper-parameter

CNN 0.8613 0.8277 0.8224

Yolo 0.8210 0.8309 0.7734

Figure 7: Accuracy vs epochs

4.3. Overall Performance381

Table7 shows the performance of the various machine learning models under382

consideration. We have CNN and YOLO in deep learning and also SVM and383

K-NN in shallow learning.When we compare CNN, YOLO, SVM and K-NN;384

CNN is more accurate.385

4.3.1. CNN386

CNN[23] is the most popular deep learning architecture. CNN along with387

openpose library gives more accuracy within the limited time duration in the388

present study. CNN also outperformed in performance metrics. In Accuracy,389

Precision, Recall(sensitivity), F1-Score and speciﬁcity, the highest value of per-390

formance metrics are as follows 0.8800, 0.9200, 0.8519, 0.8846, 0.9130. Also, the391

FPR, FDR and FNR(0.0870, 0.0800 & 0.1481 ) were the lowest value, indicating392

a good result. As accuracy, precision, recall and F1-score were high, the result393

corresponding to the algorithm can be considered to be satisfactory.394

4.3.2. YOLO395

Yolo is a real time object detection system.It also comes under deep learn-396

ing.Compared to CNN its value is low but in this study, YOLO with SVM and397

K-NN showed high values. YOLO showed high values in accuracy, precision,398

recall, f1-score and speciﬁcity as 0.8700 ,0.9000 0.8491, 0.8738 & 0.8491 respec-399

tively.Here also the lowest valued components FPR, FDR and FNR have lower400

value compared to SVM and K-NN.401

4.3.3. SVM402

SVM belongs to the general category of kernel methods. SVM is a shallow403

learning method algorithm.When it is compared with deep learning algorithms404

the performance of SVM is low.But,it is better than K-NN.405

Table 7: Performance Metrics

Algorithm Accuracy Precision Recall F1-score Speciﬁcity FPR FDR FNR

CNN 0.8800 0.9200 0.8519 0.8846 0.9130 0.0870 0.0800 0.1481

Yolo 0.8700 0.9000 0.8491 0.8738 0.8491 0.1509 0.1000 0.1509

SVM 0.8039 0.7778 0.8400 0.8077 0.7692 0.2308 0.2222 0.1600

K-NN 0.7115 0.7778 0.7000 0.7368 0.7273 0.2727 0.2222 0.3000

5. Conclusions406

This paper summaries the design and development of a system to detect in-407

correct posture for detecting cervical spondylosis in initial stages. Experiments408

were performed to test the proposed system and is found to have appreciable409

accuracy. Datasets were acquired through a web camera and the necessary pre-410

processing steps are performed to enhance the quality of the same. The system411

was trained to distinguish between four classes related to cervical spondylo-412

sis using pre-recorded data. The advantage associated with the system is that413

the end users can evaluate the correctness of their posture in real time. This414

classiﬁcation helps to ﬁnd the risk level of CS in a tranquil manner.415

The system can be also enhanced through processing classiﬁcation in a par-416

allel way. To improve the accuracy of the system, modiﬁed deep learning algo-417

rithms can be made use of. The idea of the proposed assistive system can also418

be extended to a low cost device.419

References420

[1] F. Lees, J. A. Turner, Natural history and prognosis of cervical spondylosis,421

British medical journal 2 (5373) (1963) 1607.422

[2] A. I. Binder, Cervical spondylosis and neck pain, Bmj 334 (7592) (2007)423

527–531.424

[3] D. Glew, I. Watt, P. Dieppe, P. Goddard, Mri of the cervical spine: rheuma-425

toid arthritis compared with cervical spondylosis, Clinical radiology 44 (2)426

(1991) 71–76.427

[4] R. A. Deyo, S. K. Mirza, B. I. Martin, Back pain prevalence and visit rates:428

estimates from us national surveys, 2002, Spine 31 (23) (2006) 2724–2727.429

[5] R. A. Deyo, S. K. Mirza, B. I. Martin, Back pain prevalence and visit rates:430

estimates from us national surveys, 2002, Spine 31 (23) (2006) 2724–2727.431

[6] L. Brain, M. Wilkinson, Cervical spondylosis and other disorders of the432

cervical spine, Butterworth-Heinemann, 2013.433

[7] C. Heller, P. Stanley, B. Lewis-Jones, R. Heller, Value of x ray examinations434

of the cervical spine., Br Med J (Clin Res Ed) 287 (6401) (1983) 1276–1278.435

[8] S. A. Olarinoye-Akorede, P. O. Ibinaiye, A. Akano, A. U. Hamidu, G. A.436

Kajogbola, et al., Magnetic resonance imaging ﬁndings in cervical spondy-437

losis and cervical spondylotic myelopathy in zaria, northern nigeria, Sub-438

Saharan African Journal of Medicine 2 (2) (2015) 74.439

[9] D. B. Nunez Jr, A. Zuluaga, D. A. Fuentes-Bernardo, L. A. Rivas, J. L.440

Becerra, Cervical spine trauma: how much more do we learn by routinely441

using helical ct?, Radiographics 16 (6) (1996) 1307–1318.442

[10] P. P. Chitte, U. M. Gokhale, Analysis of diﬀerent methods for identiﬁca-443

tion and classiﬁcation of cervical spondylosis (cs): A survey, International444

Journal of Applied Engineering Research 12 (21) (2017) 11727–11737.445

[11] K. Hirano, K. Shoda, K. Kitamura, Y. Miyazaki, Y. Nishida, Method for446

behavior normalization to enable comparative understanding of interactions447

of elderly persons with consumer products using a behavior video database,448

Procedia Computer Science 160 (2019) 409–416.449

[12] M. Ariz, A. Villanueva, R. Cabeza, Robust and accurate 2d-tracking-based450

3d positioning method: Application to head pose estimation, Computer451

Vision and Image Understanding 180 (2019) 13–22.452

[13] E. Ramirez, P. Melin, G. Prado-Arechiga, Hybrid model based on neural453

networks, type-1 and type-2 fuzzy systems for 2-lead cardiac arrhythmia454

classiﬁcation, Expert Systems with Applications 126 (2019) 295–307.455

[14] I. Miramontes, J. C. Guzman, P. Melin, G. Prado-Arechiga, Optimal design456

of interval type-2 fuzzy heart rate level classiﬁcation systems using the bird457

swarm algorithm, Algorithms 11 (12) (2018) 206.458

[15] P. Melin, I. Miramontes, G. Prado-Arechiga, A hybrid model based on459

modular neural networks and fuzzy systems for classiﬁcation of blood pres-460

sure and hypertension risk diagnosis, Expert Systems with Applications461

107 (2018) 146–164.462

[16] O. Castillo, P. Melin, E. Ram´ırez, J. Soria, Hybrid intelligent system for463

cardiac arrhythmia classiﬁcation with fuzzy k-nearest neighbors and neural464

networks combined with a fuzzy system, Expert Systems with Applications465

39 (3) (2012) 2947–2955.466

[17] D. G. Lowe, Object recognition from local scale-invariant features, in: Pro-467

ceedings of the Seventh IEEE International Conference on Computer Vi-468

sion, Vol. 2, 1999, pp. 1150–1157 vol.2.469

[18] H. Bay, T. Tuytelaars, L. Van Gool, Surf: Speeded up robust features, in:470

A. Leonardis, H. Bischof, A. Pinz (Eds.), Computer Vision – ECCV 2006,471

Springer Berlin Heidelberg, Berlin, Heidelberg, 2006, pp. 404–417.472

[19] E. Rublee, V. Rabaud, K. Konolige, G. Bradski, Orb: An eﬃcient alterna-473

tive to sift or surf, in: 2011 International Conference on Computer Vision,474

2011, pp. 2564–2571.475

[20] Z. Cao, G. Hidalgo, T. Simon, S.-E. Wei, Y. Sheikh, OpenPose: real-476

time multi-person 2D pose estimation using Part Aﬃnity Fields, in: arXiv477

preprint arXiv:1812.08008, 2018.478

[21] D. Wackerly, W. Mendenhall, R. L. Scheaﬀer, Mathematical statistics with479

applications, Cengage Learning, 2014.480

[22] I. P˘av˘aloi, A. Ignat, Iris image classiﬁcation using sift features, Procedia481

Computer Science 159 (2019) 241–250.482

[23] N. S. A. ALEnezi, A method of skin disease detection using image process-483

ing and machine learning, Procedia Computer Science 163 (2019) 85–92.484

[24] N. S. Altman, An introduction to kernel and nearest-neighbor nonpara-485

metric regression, The American Statistician 46 (3) (1992) 175–185. doi:486

10.1080/00031305.1992.10475879.487

[25] C. Cortes, V. Vapnik, Support-vector networks, Machine learning 20 (3)488

(1995) 273–297.489

[26] Y. LeCun, Y. Bengio, et al., Convolutional networks for images, speech, and490

time series, The handbook of brain theory and neural networks 3361 (10)491

(1995) 1995.492

[27] D. T. Nguyen, T. N. Nguyen, H. Kim, H.-J. Lee, A high-throughput and493

power-eﬃcient fpga implementation of yolo cnn for object detection, IEEE494

Transactions on Very Large Scale Integration (VLSI) Systems 27 (8) (2019)495

1861–1873.496

[28] Y. Tian, G. Yang, Z. Wang, H. Wang, E. Li, Z. Liang, Apple detection dur-497

ing diﬀerent growth stages in orchards using the improved yolo-v3 model,498

Computers and electronics in agriculture 157 (2019) 417–426.499

Detection of cervical spondylotic myelopathy based on gait analysis and deterministic learning

Article

Full-text available

Jan 2023
ARTIF INTELL REV

Cervical spondylotic myelopathy (CSM) is the main cause of cervical spinal cord dysfunction in adults, especially in middle-aged and elderly patients, which easily leads to gait disturbance. In the present study, we propose a dynamic method for the detection of CSM based on nonlinear dynamics of gait system and deterministic learning theory. First, a 3-dimensional (3D) gait analysis system is used to capture the walking locomotion from healthy controls (HCs) and patients with CSM. Discriminant kinematic gait features, including angles of hip and knee joints in the sagittal and coronal planes, are extracted based on statistical analysis and clinicians’ empirical investigation. Second, deterministic learning theory is used to model and identify nonlinear gait system dynamics of HCs and patients with CSM, which are approximated and stored in constant Radial Basis Function (RBF) neural networks (NN). The disparity of gait system dynamics between the two groups of participants is used for classification and detection of the presence of CSM by constructing a bank of dynamic estimators with constant RBF NN. Finally, experiments are carried out on the self-constructed CSM gait database to evaluate the performance of the proposed method, in which gait data from 45 CSM patients and 45 age-matched HCs are involved. By using 2-fold and leave-one-out cross-validation styles, the achieved average classification accuracy is reported to be 94.44$\%$ and 95.56$\%$, respectively. The results demonstrate excellent performance and the proposed method has the potential to serve as a candidate for the automatic detection of CSM in clinical examination.

Cervical Spondylosis Diagnosis Based on Convolutional Neural Network with X-ray Images

Article

Full-text available

May 2024
SENSORS-BASEL

The increase in Cervical Spondylosis cases and the expansion of the affected demographic to younger patients have escalated the demand for X-ray screening. Challenges include variability in imaging technology, differences in equipment specifications, and the diverse experience levels of clinicians, which collectively hinder diagnostic accuracy. In response, a deep learning approach utilizing a ResNet-34 convolutional neural network has been developed. This model, trained on a comprehensive dataset of 1235 cervical spine X-ray images representing a wide range of projection angles, aims to mitigate these issues by providing a robust tool for diagnosis. Validation of the model was performed on an independent set of 136 X-ray images, also varied in projection angles, to ensure its efficacy across diverse clinical scenarios. The model achieved a classification accuracy of 89.7%, significantly outperforming the traditional manual diagnostic approach, which has an accuracy of 68.3%. This advancement demonstrates the viability of deep learning models to not only complement but enhance the diagnostic capabilities of clinicians in identifying Cervical Spondylosis, offering a promising avenue for improving diagnostic accuracy and efficiency in clinical settings.

Promoting a novel method for warranty claim prediction based on social network data

Article

Sep 2021
RELIAB ENG SYST SAFE

Warranty plays an important role in retaining consumers' loyalty, increasing the competitive advantage and the profit of companies. Moreover, warranty claim prediction based on social media is a novel area, enabling managers to foresee problems in production and take the proper measures to mitigate them. The higher the precision of the warranty claim predictions, the lower the risk the company faces. This paper examines the impacts of utilizing social media data on daily warranty claim prediction. In this paper, we showed that social media data could enhance the accuracy of daily warranty claim predictions. We cooperated with Sam Service Warranty Company that provides warranty and aftersales services for Samsung products in Iran. Warranty operational data along with Twitter data analyses were used to improve the precision of warranty claim prediction. Operational data from Sam Service Company include the total number of warranties, the number of warranties for new customers, and the number of warranties for those who return. A novel framework was presented that uses the Random Forrest algorithm for prediction of the number of daily warranty claims. The results show that our framework improves the accuracy of out-of-sample warranty claims predictions, with respective development at a range of 14.98% to 21.90% across various timeframes. Improving prediction accuracy enables managers to effectively minimize warranty-related costs, inventory levels, waste, and customer dissatisfaction while maximizing the return on investment, profit, efficiency, and customer satisfaction.

Exploring gait analysis and deep feature contributions to the screening of cervical spondylotic myelopathy

Article

Full-text available

Jul 2023
APPL INTELL

In the cervical region of middle-aged and elderly patients, cervical spondylotic myelopathy (CSM) is frequently recognized as the primary factor that contributes to spinal cord dysfunction. Numbness and gait disturbance are the main clinical manifestations of CSM, which exhibits as a stiff and spastic gait in comparison with that of healthy controls (HCs). Because it is difficult to screen CSM in the primary stage which easily leading to a delay in medication, the identification of CSM followed by treatment is urgent. The aim of this study is to develop an automated classification method for the screening of CSM, using fifty-four lower extremity kinematic parameters derived from three-dimensional gait analysis. The present study employs a deep neural network (DNN) model to automatically extract informative features from raw gait kinematic data. Hierarchically placed layers in the DNN produce deep feature maps that are used to screen CSM using multiple shallow classifiers. The proposed method is evaluated using a self-constructed gait database of patients diagnosed with CSM and HCs, both groups consisting of 45 individuals within a similar age range. Experimental results reveal that the combination of deep features and shallow classifiers yields remarkable accuracy rates for binary classification with twofold, tenfold, and leave-one-out cross-validation methods, all achieving an accuracy of 99.44 %\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathrm{\%}$$\end{document}. The data suggest that our approach is efficient in detecting the early onset CSM and performs better than other cutting-edge techniques.

Effect of Distance Measures on K-Nearest Neighbour Classifier

Conference Paper

Sep 2022

A High-Throughput and Power-Efficient FPGA Implementation of YOLO CNN for Object Detection

Article

Full-text available

Aug 2019

Duy Thanh Nguyen

Convolutional neural networks (CNNs) require numerous computations and external memory accesses. Frequent accesses to off-chip memory cause slow processing and large power dissipation. For real-time object detection with high throughput and power efficiency, this paper presents a Tera-OPS streaming hardware accelerator implementing a you-only-look-once (YOLO) CNN. The parameters of the YOLO CNN are retrained and quantized with the PASCAL VOC data set using binary weight and flexible low-bit activation. The binary weight enables storing the entire network model in block RAMs of a field-programmable gate array (FPGA) to reduce off-chip accesses aggressively and, thereby, achieve significant performance enhancement. In the proposed design, all convolutional layers are fully pipelined for enhanced hardware utilization. The input image is delivered to the accelerator line-by-line. Similarly, the output from the previous layer is transmitted to the next layer line-by-line. The intermediate data are fully reused across layers, thereby eliminating external memory accesses. The decreased dynamic random access memory (DRAM) accesses reduce DRAM power consumption. Furthermore, as the convolutional layers are fully parameterized, it is easy to scale up the network. In this streaming design, each convolution layer is mapped to a dedicated hardware block. Therefore, it outperforms the “one-size-fits-all” designs in both performance and power efficiency. This CNN implemented using VC707 FPGA achieves a throughput of 1.877 tera operations per second (TOPS) at 200 MHz with batch processing while consuming 18.29 W of on-chip power, which shows the best power efficiency compared with the previous research. As for object detection accuracy, it achieves a mean average precision (mAP) of 64.16% for the PASCAL VOC 2007 data set that is only 2.63% lower than the mAP of the same YOLO network with full precision.

A Method Of Skin Disease Detection Using Image Processing And Machine Learning

Article

Full-text available

Jan 2019

Nawal Alenezi

Skin diseases are more common than other diseases. Skin diseases may be caused by fungal infection, bacteria, allergy, or viruses, etc. The advancement of lasers and Photonics based medical technology has made it possible to diagnose the skin diseases much more quickly and accurately. But the cost of such diagnosis is still limited and very expensive. So, image processing techniques help to build automated screening system for dermatology at an initial stage. The extraction of features plays a key role in helping to classify skin diseases. Computer vision has a role in the detection of skin diseases in a variety of techniques. Due to deserts and hot weather, skin diseases are common in Saudi Arabia. This work contributes in the research of skin disease detection. We proposed an image processing-based method to detect skin diseases. This method takes the digital image of disease effect skin area, then use image analysis to identify the type of disease. Our proposed approach is simple, fast and does not require expensive equipment other than a camera and a computer. The approach works on the inputs of a color image. Then resize the of the image to extract features using pretrained convolutional neural network. After that classified feature using Multiclass SVM. Finally, the results are shown to the user, including the type of disease, spread, and severity. The system successfully detects 3 different types of skin diseases with an accuracy rate of 100%.

Method for Behavior Normalization to Enable Comparative Understanding of Interactions of Elderly Persons with Consumer Products using a Behavior Video Database

Article

Full-text available

Jan 2019

Consumer product safety for dementia sufferers is a global problem. To develop products that can be safely used by elderly people with degradation of physical and cognitive functions (e.g., people with dementia), it is necessary to measure the product use behavior of elderly people using them in various environments and quantitatively analyze behavioral changes during product use due to changes in the functions. Recent developments in smart home technology are opening a new path for quantifying the behavior of elderly people in daily environments. This proposes a new method for comparative understanding of the elderly’s interactions with consumer products. The method employs three functions: robust pose estimation, behavior normalization method, and clustering. This paper also describes the evaluation of the developed functions and report their application to analyzing an elderly behavior library, which is an RGB-D database for elderly product use in daily environments.

Iris Image Classification Using SIFT Features

Article

Full-text available

Jan 2019

The object of interest of this paper is automatic iris classification when dealing with missing information. Our approach uses and extends a method for face recognition, based on Scale Invariant Feature Transform (SIFT). We adapted this method for iris classification and tested it on occluded iris images. We add to the keypoint matching procedure new conditions that improve the classification rate. We tested different parameters involved in the SIFT extraction process and the keypoint matching scheme on eleven image datasets with different levels of occlusion. For testing, a standardized segmented UPOL iris database was employed. We experimentally prove that the proposed approach has better results when compared with both the original method and the Daugman procedure on all datasets.

Robust and accurate 2D-tracking-based 3D positioning method: Application to head pose estimation

Article

Full-text available

Mar 2019

Head pose estimation (HPE) is currently a growing research field, mainly because of the proliferation of human–computer interfaces (HCI) in the last decade. It offers a wide variety of applications, including human behavior analysis, driver assistance systems or gaze estimation systems. This article aims to contribute to the development of robust and accurate HPE methods based on 2D tracking of the face, enhancing performance of both 2D point tracking and 3D pose estimation. We start with a baseline method for pose estimation based on POSIT algorithm. A novel weighted variant of POSIT is then proposed, together with a methodology to estimate weights for the 2D–3D point correspondences. Further, outlier detection and correction methods are also proposed in order to enhance both point tracking and pose estimation. With the aim of achieving a wider impact, the problem is addressed using a global approach: all the methods proposed are generalizable to any kind of object for which an approximate 3D model is available. These methods have been evaluated for the specific task of HPE using two different head pose video databases; a recently published one that reflects the expected performance of the system in current technological conditions, and an older one that allows an extensive comparison with state-of-the-art HPE methods. Results show that the proposed enhancements improve the accuracy of both 2D facial point tracking and 3D HPE, with respect to the implemented baseline method, by over 15% in normal tracking conditions and over 30% in noisy tracking conditions. Moreover, the proposed HPE system outperforms the state of the art on the two databases.

Optimal Design of Interval Type-2 Fuzzy Heart Rate Level Classification Systems Using the Bird Swarm Algorithm

Article

Full-text available

Dec 2018

In this paper, the optimal designs of type-1 and interval type-2 fuzzy systems for the classification of the heart rate level are presented. The contribution of this work is a proposed approach for achieving the optimal design of interval type-2 fuzzy systems for the classification of the heart rate in patients. The fuzzy rule base was designed based on the knowledge of experts. Optimization of the membership functions of the fuzzy systems is done in order to improve the classification rate and provide a more accurate diagnosis, and for this goal the Bird Swarm Algorithm was used. Two different type-1 fuzzy systems are designed and optimized, the first one with trapezoidal membership functions and the second with Gaussian membership functions. Once the best type-1 fuzzy systems have been obtained, these are considered as a basis for designing the interval type-2 fuzzy systems, where the footprint of uncertainty was optimized to find the optimal representation of uncertainty. After performing different tests with patients and comparing the classification rate of each fuzzy system, it is concluded that fuzzy systems with Gaussian membership functions provide a better classification than those designed with trapezoidal membership functions. Additionally, tests were performed with the Crow Search Algorithm to carry out a performance comparison, with Bird Swarm Algorithm being the one with the best results.

OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields

Article

Jul 2019

Realtime multi-person 2D pose estimation is a key component in enabling machines to have an understanding of people in images and videos. In this work, we present a realtime approach to detect the 2D pose of multiple people in an image. The proposed method uses a nonparametric representation, which we refer to as Part Affinity Fields (PAFs), to learn to associate body parts with individuals in the image. This bottom-up system achieves high accuracy and realtime performance, regardless of the number of people in the image. In previous work, PAFs and body part location estimation were refined simultaneously across training stages. We demonstrate that using a PAF-only refinement is able to achieve a substantial increase in both runtime performance and accuracy. We also present the first combined body and foot keypoint detector, based on an annotated foot dataset that we have publicly released. We show that the combined detector not only reduces the inference time compared to running them sequentially, but also maintains the accuracy of each component individually. This work has culminated in the release of OpenPose, the first open-source realtime system for multi-person 2D pose detection, including body, foot, hand, and facial keypoints.

Hybrid model based on neural networks, type-1 and type-2 fuzzy systems for 2-lead cardiac arrhythmia classification

Article

Feb 2019
EXPERT SYST APPL

This paper describes an approach using computational intelligence methods to form a hybrid model as a classification method for 2-lead cardiac arrhythmias. The hybridization of methods can increase the performance in a system and take advantage of the benefits offered by such techniques in solving complex problems. The interpretation of electrocardiograms is a useful task for physicians, but when it comes to reviewing more than 24 h of information, it becomes a laborious task for them. For this reason, the design a computational model that helps in such a task is very useful for the timely medical diagnosis. The hybrid model is build using artificial neural networks and fuzzy logic. Training and testing of the hybrid model was with the Massachusetts Institute of Technology and Beth Israel Hospital (MIT-BIH) arrhythmia database. The heartbeats are preprocessed to improve results of classification. Ten different classes of normal and arrhythmia signals for building the hybrid model are considered. We used two electrode signals or leads included in the MIT-BIH arrhythmia database, MLII and V1, V2, or V3 as second electrode signal. The hybrid model is composed by two basic module units, as described below. A basic module unit to perform the classification for each signal lead is used. Each basic module unit is composed of three different classifiers based on the following models: fuzzy KNN algorithm, multilayer perceptron with gradient descent and momentum (MLP-GDM), and multilayer perceptron with scaled conjugate gradient backpropagation (MLP-SCG). The outputs from the classifiers are combined using a fuzzy system for integration of results. We designed two fuzzy systems, Mamdani type-1 fuzzy system (type-1 FIS) and an interval type-2 fuzzy system (IT2FIS). The reason is to perform a comparison between type-1 FIS and IT2FIS in the hybrid model. We have obtained best results in the classification rate using IT2FIS instead of type-1 FIS in the basic units. Finally, a type-1 FIS is used to determine the global classification for the 2 basic units in hybrid model. We obtained a good classification rate in each basic module unit, 92.90% and 92.70% of classification rate for basic modules unit 1 and unit 2 respectively. Finally, we obtained a 93.80% when used type-1 FIS and 94.20% of classification rate used IT2FIS combining both basic module units. In the results presented, we improve the global classification in proposed hybrid model combining neural networks and fuzzy logic used both signal lead included in MIT-BIH arrhythmia database. The proposed hybrid model maybe extended to use multi-lead arrhythmia classification using other databases that contain 12 leads to be able to make a complete medical diagnosis.

Apple detection during different growth stages in orchards using the improved YOLO-V3 model

Article

Feb 2019
COMPUT ELECTRON AGR

Real-time detection of apples in orchards is one of the most important methods for judging growth stages of apples and estimating yield. The size, colour, cluster density, and other growth characteristics of apples change as they grow. Traditional detection methods can only detect apples during a particular growth stage, but these methods cannot be adapted to different growth stages using the same model. We propose an improved YOLO-V3 model for detecting apples during different growth stages in orchards with fluctuating illumination, complex backgrounds, overlapping apples, and branches and leaves. Images of young apples, expanding apples, and ripe apples are initially collected. These images are subsequently augmented using rotation transformation, colour balance transformation, brightness transformation, and blur processing. The augmented images are used to create training sets. The DenseNet method is used to process feature layers with low resolution in the YOLO-V3 network. This effectively enhances feature propagation, promotes feature reuse, and improves network performance. After training the model, the performance of the trained model is tested on a test dataset. The test results show that the proposed YOLOV3-dense model is superior to the original YOLO-V3 model and the Faster R-CNN with VGG16 net model, which is the state-of-art fruit detection model. The average detection time of the model is 0.304s per frame at 3000 × 3000 resolution, which can provide real-time detection of apples in orchards. Moreover, the YOLOV3-dense model can effectively provide apple detection under overlapping apples and occlusion conditions, and can be applied in the actual environment of orchards.

A hybrid model based on modular neural networks and fuzzy systems for classification of blood pressure and hypertension risk diagnosis

Article

Apr 2018
EXPERT SYST APPL

In this paper, a hybrid model using modular neural networks and fuzzy logic was designed to provide the hypertension risk diagnosis of a person. This model considers age, risk factors and behavior of the blood pressure in a period of 24 h, using as a basis the Framingham Heart Study. Records of blood pressure are collected with the ambulatory blood pressure monitoring (ABPM), a device which takes readings for a period of time of 24 h. A modular neural network was designed, with three modules, of which the first and second modules correspond to the systolic and diastolic pressures and the last one to the heart rate. Each module is trained with the data obtained by the ABPM of different patients, this in order that the neural network learns the different behaviors that the blood pressure may have. Also, different architectures and learning methods are considered to obtain the best possible architecture. In addition, two fuzzy inference systems (FISs) for classification purpose are proposed, the first one for the heart rate level and the second one for the night profile of the patient. These were tested with different types of membership functions and then selecting the FIS that obtained the best results. Furthermore, a third FIS as a blood pressure classifier is also used. The different proposed methodologies were tested, in the case of the modular neural network to find the architecture that produces better results and in the fuzzy inference systems to find which membership functions were the ideal ones for the case study, in this way obtaining overall good results. For the case of the modular neural network, the learning accuracy in the first module is 98%, in the second module is 97.62% and the third module is 97.83% respectively. For the night profile, the fuzzy system is compared to a traditional system of production rules, and it is noted that the first one gives all correct outputs and the second one just gives 53% of the outputs, this is due to the uncertainty handling that fuzzy systems can provide, which the traditional system cannot because its rules are very strict. Hybrid intelligent systems for the solution of this kind of complex problems have excellent performance, due to the good learning in each module of the neural network and the classification uncertainty that is well managed by the fuzzy systems, obtaining with this a hybrid combination for achieving good results.

Comparative analysis of Machine Learning approaches for early stage Cervical Spondylosis detection

Abstract and Figures

Recommended publications

Hyperparameter Tuning Based Performance Analysis of Machine Learning Approaches for Prediction of Ca...

Factory Automatic Meter Reading Robot

A machine learning based framework for assisting pathologists in grading and counting of breast canc...

Improving Cardiovascular Disease Prognosis Using Outlier Detection and Hyperparameter Optimization o...