Available via license: CC BY-NC-ND 4.0
Content may be subject to copyright.
Journal Pre-proofs
Comparative analysis of Machine Learning approaches for early stage Cervi‐
cal Spondylosis detection
M. Sreeraj, Jestin Joy, Manu Jose, Meenu Varghese, T.J. Rejoice
PII: S1319-1578(20)30448-1
DOI: https://doi.org/10.1016/j.jksuci.2020.08.010
Reference: JKSUCI 832
To appear in: Journal of King Saud University - Computer and
Information Sciences
Received Date: 17 April 2020
Accepted Date: 19 August 2020
Please cite this article as: Sreeraj, M., Joy, J., Jose, M., Varghese, M., Rejoice, T.J., Comparative analysis of
Machine Learning approaches for early stage Cervical Spondylosis detection, Journal of King Saud University -
Computer and Information Sciences (2020), doi: https://doi.org/10.1016/j.jksuci.2020.08.010
This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover
page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version
will undergo additional copyediting, typesetting and review before it is published in its final form, but we are
providing this version to give early visibility of the article. Please note that, during the production process, errors
may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
© 2020 Production and hosting by Elsevier B.V. on behalf of King Saud University.
Comparative analysis of Machine Learning approaches
for early stage Cervical Spondylosis detection
Abstract
Cervical Spondylosis(CS) is a chronic spinal condition in which the spine grad-
ually stiffens and can finally become completely inflexible. It is arduous to
diagnose in early stages and leads to delay in medication. The risk level of cer-
vical spondylosis can be reduced if it is detected in primary care. Based on this
objective, a system is designed and developed to diagnose and predict the sever-
ity of cervical spondylosis in early stages. Different machine learning techniques
are evaluated for this and results indicate that machine learning techniques can
provide a low cost and accurate mechanism for early stage spondylosis detection.
Keywords: Cervical Spondylosis(CS), CNN, ORB, Sitting posture, Machine
Learning.
1. Introduction1
A common cause of spinal cord dysfunction in elderly persons is Cervical2
Spondylosis (CS). Lifestyle, age, occupation and many other factors can be the3
reason for rapid growth of spine related diseases. Early detection followed by4
medication can help to reduce the risk level of the disease. Technology to detect5
cervical spondylosis in early stages is nonexistent, since it is very difficult to6
diagnose in primary stage.7
A chronic deterioration of the vertebrae and discs of the neck results in8
Cervical Spondylosis(CS). The bony projections that are seen along joints are9
referred to as bone spurs or osteophytes. The disease is also often linked with10
arthritis[1, 2, 3]. In cervical spondylosis, the degree of pain can differ from11
individual to individual. By practising a healthy lifestyle, one can reduce the12
risk of CS.13
According to the recent study[4] by the Office for National Statistics[5], the14
count of people with lower back pain and neck pain is growing everyday. This15
affects their ability to do work. From reports, around 31% men and 20% of16
women are facing the issue. Lower back pain is a problem faced by employees17
who sit for long hours in a particular posture. This reduces the utility of their18
ability to work efficiently. Timely detection of such bad postures which cause19
lower back pain helps to diagnose and treat the problem. Recent studies proved20
that sitting habits have great significance in physical health. There are different21
measuring tools used to evaluate neck pain caused by cervical spondylosis. But22
Preprint submitted to Elsevier August 8, 2020
these systems take more time to detect and have less accuracy. This calls for23
a better solution to detect and predict the risk level of cervical spondylosis. A24
low cost, fast and efficient system is proposed in this paper.25
The system presents a detection method using data from a camera. The26
overall working of the system is as follows. A camera is placed either to the27
right or left side of the person, and it continuously takes video of the person28
sitting in a chair. The camera takes the video around 3-4 hours and splits the29
video into frames. After splitting the frames, similarity between the images30
are calculated. If a frame occurs more than 60%, that frame will be selected3 1
for further processing. The selected frame is tested using the machine learning32
model. The images are classified into four different classes - normal, mild,33
average and high risk levels. Key points are identified and calculated and that34
of cervical spine and lower back are extracted. Angles calculated from these are35
used for classification.36
The proposed system can be used by general public to find out the chances of37
getting cervical spondylosis in early stages by inputting a video of their sitting38
posture. The system predicts the level of risk involved by making use of machine39
learning algorithms. Different algorithms based on shallow learning and deep40
learning techniques are evaluated for this purpose.41
2. Literature Review42
Cervical Spondylosis (CS) is usually asymptomatic, but may be present with43
symptoms like neck pain, stiffness or even shoulder pain and severe joint pain of44
arms as well as tactile sensations. This points to age-related disc degenerative45
chronic CS. Neck pain is one of the most common issues among patients with46
CS. Clinical examination, spinal angiography, X-ray, MRI scan and computer47
tomography[6] are the present diagnostic methods used. Among these methods,48
x-ray is widely used, because it is cost effective and has low radiation. But49
the main issue with x-ray evaluation is the low accuracy. It depends on the50
experience and knowledge of the clinicians[7]. So different clinicians may give51
different clinical reports based on the x-ray analysis, and it is difficult to diagnose52
the risk level of CS.53
MRI images can be used to diagnose CS. MRI is different from x-ray because54
it does not cause radiation. The computer gives cross-sectional binary images of55
the body that are converted into three-dimensional (3-D) images of the scanned56
area. This helps to pinpoint problems in the cervical spine when the scan focuses57
on that area. MRI images show the details of soft tissues such as cartilages and58
nerve roots. This test can show spinal compressions more clearly than x-rays.An59
MRI scan of the cervical spine is used only if the pain persists even after normal60
treatment. The only drawback with the MRI is that it takes around 30-4561
minutes. Other issues are some preconditions. Before performing an MRI scan,62
the doctor must ensure that the person is not diabetic, does not have kidney63
problems, or not in the first trimester of pregnancy. In these cases the doctor64
should take extra care[8].65
2
A cervical spine CT scan[9] is a medical tool for developing a visual image of66
the cervical spine using advanced X-ray equipment and computer imaging. The67
cervical spine is the part of the spine passing across your neck. Despite this,68
the examination is often called a neck CT scan. When a person has important69
information Via testing bone density, certain bone disorders, such as arthritis or70
CS, will help a doctor assess the severity of the disorder and classify it. A normal71
X-ray brings a small amount of radiation into the body of the patient. Bones72
and soft tissue absorbs radiation differently, so that they appear on the X-ray73
film in various colours. A CT scan works similarly but many X-rays are taken74
in a spiral fashion instead of one flat image. It provides more information and75
accuracy. When the patient is inside the scanner, several X-ray beams pass in76
a circular motion across the upper torso and neck as electronic X-ray detectors77
monitor the radiation that the body absorbs. This information is interpreted by78
a computer to produce separate images, called slices. These are then combined79
to create a cervical spine 3-D model.80
Computer technology has its advantages such as early stage diagnosis, less81
time and effort, cost effectiveness, higher efficiency and accuracy. Machine82
Learning based algorithms[10] for detecting Cervical Spondylosis are discussed83
in literature. Most of the methods use MRI images for classification.84
Kei Hirano et.al[11] proposed a novel approach for quantitative analysis of85
the relationships of elderly patients and consumer goods. The authors have86
introduced three functions: robust Pose estimation, standardization and clus-87
tering process. It specifically supports elderly people with physical and cognitive88
ability impairment (e.g. dementia). Mikel Ariz et.al[12] proposed a method for89
head pose estimation using 2D tracking of the face and also enhancing 2D point90
tracking and 3D pose estimation. The baseline form for pose estimation is ex-91
ploited and a novel weighted variant of POSIT algorithm is proposed in this92
work.93
Eduardo Ramirez et.al[13] proposed a hybrid model as classification method94
for 2-lead cardiac arrhythmias. Artificial neural networks and fuzzy logic is used95
to develop this system. Ivette Miramontes et.al[14] describe optimal design of96
type-1 and interval type-2 fuzzy systems. Type-1 fuzzy systems are designed97
and optimized with trapezoidal membership function and second by Gaussian98
membership function. Crow search algorithm and Bird swan algorithm were99
compared for performance comparison. P. Melin, I. Miramontes et.al[15] pro-100
posed a hybrid model using modular networks and a fuzzy system developed for101
the hypertension risk diagnosis. The modular network shows a learning accu-102
racy of 98%, 97.62% and 97.83% in first, second and third modules respectively.103
O. Castillo, P. Melin et.al[16] explained about the hybrid intelligent system for104
arrhythmia classification. They tried the combination of fuzzy KNN with neural105
network with Mamdani fuzzy system. The methods used for classification were106
Fuzzy K-Nearest Neighbors, Multi Layer Perceptron with Gradient Descent and107
momentum and Multi Layer Perceptron with Scaled Conjugate Gradient Back-108
propagation. 98% accuracy was obtained using Mamdani type fuzzy inference109
system.110
3
3. Design and Implementation111
The proposed system is a patient assistive system that can be used to detect112
the onset of cervical spondylosis in early stages. Both shallow learning and deep113
learning techniques are evaluated for this.114
The captured videos of the individual is used to find out the115
chances of getting cervical spondylosis. 72 volunteers were recruited116
for collecting data. They were divided into four classes containing 18117
persons each. Individuals in each of these four classes were asked to118
sit in the prescribed position - normal, mild, average and high, which119
denotes the chances of getting cervical spondylosis. Their positions120
were evaluated by a medical practitioner who is an expert in the field121
of cervical spondylosis. From these four classes, a 6-fold cross valida-122
tion method was used for evaluation. That is 83.33% (60 videos) for123
training purpose and the remaining 16.66%(12 videos) for testing.124
The block diagram of the proposed system is shown in Figure 1.125
Figure 1: Block diagram of the proposed system
The main objective of the first method is to find the best frame- the frame126
which has the capability of giving the sitting position of a person- based on127
the time duration a person sits in a specific posture. To obtain the best frame,128
video is captured and is split into different frames that contain the key frames.129
Different methods like Mean Squared Error Method (MSE), Reduced Average130
Method and Keypoint Descriptor Method are evaluated for this. Best perfor-131
mance is obtained by Keypoint Descriptor Methods such as SIFT[17], SURF[18]132
and ORB[19].133
4
Keypoints are detected from Part Affinity Fields (PAFs)[20] representation.134
It helps to associate body parts with human image. This method can easily135
identify key points in hand, foot and body.136
Cervical Spondylosis has four different stages, from normal to severe. Based137
on these risk levels, different target objects are created based on the angles cal-138
culated according to the law of cosine using key points obtained from previous139
stages. In shallow learning method, K-NN and SVM classifiers are studied. In140
Deep Learning, YOLO and CNN are considered. In YOLO, the target object141
detection is done on every frame of the video and also calculates the duration of142
the continuous time spent on a particular position. In CNN based approach, ev-143
ery frame is passed to find out the key points and the classification is performed144
on that.145
3.1. Mean Squared Error Method(MSE)146
MSE[21] is used to select the most repeating frame in a particular interval147
of time. At first, the starting frame is selected as the pivot element. If the148
person is sitting, it is assumed that the sitting position angle is 90°, and when149
the sitting angle changes to 60°- 65°an error occurs in frames. So the MSE150
between the first frame and second frame is computed. If there is no error or151
a small error is found, then the first frame and the third frame are checked,152
calculated and this process continues. When a major change occurs in frames153
then that particular frame is set as the pivot element.154
MSE =1
n
n
X
i=1
(yi−˜y)2(1)
Likewise, the entire frames are subjected to MSE computation. Based on this155
error value, a pattern is obtained. This is shown in Figure 2 The main issue of156
this method is that it is sensitive to outliers. The disadvantage of MSE can be157
overcome by using Minimum Mean Square Error (MMSE) or Reduced Average158
Method (RAM).159
3.2. Reduced Average Method160
The key frames obtained using MSE are also not accurate, as MSE is sensitive161
to outliers. It contains frames with both very high error value and very low162
error value. By using Reduced Average Method these frames can be avoided.163
After avoiding these frames, a set of frames with a range of error values can be164
obtained. For getting a feasible solution, the average is computed. Based on165
these values, the reduced average stage-1 is illustrated in Figure 3.166
In this technique the issue is an outlier with repeated patterns .Thus it is167
difficult to find out the most repeating frame from the pattern.168
5
Figure 2: Pattern generated after implementing MSE
Figure 3: Reduced average-stage1
From the illustration in Figure 3, it is observed that there exists some peak169
values which can be avoided through normalization. By normalization the num-170
ber of frames are calculated using maximum error and minimum error. The171
6
errors are distributed into different numbers of frames as follows:172
No.off rames =M ax.error −Min.error
100 (2)
From the frames, the key frame is selected which has the highest frequency173
of error. The most repeating frames have a similar range of errors and are174
illustrated in Figure 4 which gives a feasible solution but not optimal. The175
frames are selected and subjected to further processing.
Figure 4: Reduced average-stage2
176
3.3. Keypoint Descriptor Methods177
This method considers dissimilar frames for computation. There are different178
keypoint descriptor methods like SIFT[17], SURF[18] and ORB[19]. Our studies179
have found that this is the most feasible method for finding the similarities180
between the frames.181
3.3.1. ORB182
Oriented FAST and Rotated BRIEF (ORB)[19] is a fusion of FAST key-183
point detector and BRIEF descriptor with many modifications to enhance the184
performance. Top N points of the keypoints are obtained by applying the Harris185
corner measure upon the keypoints found through FAST which results in multi-186
scale features. The limitation of FAST is that it doesn’t compute orientation.187
To improve the rotation invariance, moments are computed with xand ywhich188
should be in a circular region of radius r, where ris the size of the patch. ORB189
feature is used to find the similarity between the images. First the video is split190
into frames. These frames are subjected to ORB to find the similar images.191
7
The reason for finding the similar images is that,when a person is sitting for a192
long time in a similar posture, it is probably the comfortable posture for that193
person.194
3.3.2. SIFT195
The SIFT[17, 22] algorithm mainly consists of four steps. In the first step,196
the Difference of Gaussian (DoG) method is used to estimate a scale space197
extreme followed by a key point localization which is performed in the second198
step. A refinement is also done in the same scene to eliminate the low contrast199
points. In the third step, the key point orientation is done upon local image200
gradients. A descriptor generator is computed in the final step which is the201
local image descriptor for each generated key point. The descriptor generated202
is a function of the gradient magnitude and orientation.203
3.3.3. SURF204
In SURF[18] technique, the Difference of Gaussian (DoG) is approximated205
with the aid of box filters. The technique uses squares for approximating, as206
the convolution is faster while using integral images. This process can also be207
approached in a parallel manner.208
3.4. Keypoint detection209
Keypoint detection for finding the features(angle deviation corresponding to210
hip and neck) with respect to the sitting position of the person in key frame is211
described in Algorithm 1.212
Keypoints labelling corresponding to the human is described in the Table 1.213
Figure 5: Keypoints Labeling
8
Data: Keyframes
Result: Angles corresponding to hip and neck
while not at end of this frame do
Step 1:Preprocessing stage:;
Convert the image from [0, 255] to [-1, 1]
img = img * (2.0 / 255.0) — 1.0;
Step 2 :Pass the image through Neural Network.The output for this is a heat map
matrix and a Part Affinity Fields matrix;
while not at end of connection do
Step 3: Non Maximum Suppression: Here we have to detect the parts in the
image.For that extract parts locations out of a heatmap such as ‘The local
maximums’.For Local maximus apply a non-maximum suppression (NMS)
algorithm;
3.1 Launch the heatmap at first pixel.;
3.2 Cover the pixel with a side 5 window to locate the max value in the area.;
3.3 Substitute the center-pixel value with the maximum;
3.4 Move/Stride the window one pixel and and repeat the steps until we have
filled the entire heatmap.;
3.5 The output is contrasted with the initial heatmap. Those pixels that
remain the same value are the points that we’re searching for. Replace all
pixels bringing them to a value of 0.;
Step 4:Generate complete bipartite graph;
Step 5:Applies Line Integral;
Step 6:Generates a bipartite weighted graph;
Step 7:Implement Assignment Algorithm;
7.1 Filter by the score of each potential connection;
7.2 The connection to the highest score is a conclusive connection indeed;
7.3 Going to contact as fast as possible. If no portions of this connection were
previously attributed to a final connection that is a final connection;
7.4 Repeat step 3, until finished.
end
Step 8:Merging : transform these detected connections into the final skeletons;
Step 9:key points are identified;
Step 10:two(neck & hip) of them were selected;
Step 11:Angle were calculated according to law of cosines;
Step 12:Angle is converted to vector along with each frame information;
end
Algorithm 1: Angle findings through open pose
9
Table 1: Keypoints Labeling
Keypoint Body part
0 Nose
1 Neck
2 Right Shoulder
3 Right Elbow
4 Right Wrist
5 Left Shoulder
6 Left Elbow
7 Left Wrist
8 Right Hip
9 Right Knee
10 Right Ankle
11 Left Hip
12 Left Knee
13 Left Ankle
3.5. Classification214
After keypoints are detected, it is fed to a classification algorithm to find the215
problematic frame and evaluated using different machine learning algorithms.216
K-NN and SVM[23] Classifiers were used to classify frames. K-NN is based on217
feature similarity. The SVM classifier works on a wide range of classification218
problems which are high dimensional in nature. However, SVM requires fine219
tuning of the key parameters to achieve good accuracy in classification.220
3.5.1. K-NN Classification221
K-nearest Neighbors[24] is a lazy algorithm that stores all instances and clas-222
sifies unknown instances based on a similarity measure. KNN has been widely223
used in pattern recognition and estimation problems. KNN makes predictions224
for a new instance xby searching through the entire stored instances for the225
K-most similar instances and assigns the same to the unknown instance. The226
similarity is measured through different distance measures like Euclidean dis-227
tance and Manhattan distance.228
Euclidean =v
u
u
t
k
X
i=1
(xi−yi)2(3)
229
Manhattan =v
u
u
t
k
X
i=1
|xi−yi|(4)
3.5.2. SVM Classifier230
A Support Vector Machine (SVM)[25] constructs a hyperplane or set of231
high or infinite-dimensional spaces that can be used for classification or regres-232
10
sion. Intuitively, the hyperplane which has the greatest distance to the closest233
training data point of any class (so-called functional margin) achieves a strong234
separation, because in general the greater the margin the lower the classifier’s235
generalization error. SVM uses kernel methods to address non-linear problems.236
A kernel method is an algorithm which only relies on the data by dot-products.237
If this is the case, a kernel function which calculates a dot product in some prob-238
ably high-dimensional feature space will replace the dot product. The kernel239
methods help to generate non-linear decision boundaries using linear classifier240
methods. They also help the user to add data that does not have a clear illus-241
tration of the fixed-dimensional vector space.242
Gaussian, K(x, y) = exp(−||x−y||2
2σ2) (5)
3.5.3. CNN based classification243
CNN[26] is the most popular deep learning architecture. Our study has found244
that CNN can easily classify keypoint frames as compared to KNN and SVM.245
Every node of the neural network has their own sphere of knowledge about rules246
and functionalities to develop itself through experiences learned from previous247
techniques.248
The CNN architecture used is shown in Figure 6. The CNN architecture249
consists of a total 23 layers. Out of it 10 are convolutional layers. 4 layers are250
maxpool layers, 4 fully connected layers, 2 batch normalization layers, 2 ReLu251
layers, and one tanh layer. The first and second layer is a convolutional layer252
with 64 kernel filters of 3×3 size. Followed by a max pooling layer. Then the253
fourth and fifth layer is again a convolutional layer with 128 kernel filters of254
3×3 size. Sixth layer is a max pooling layer. Repeatedly seventh and eighth be-255
come convolutional layers with 256 kernel filters and 3×3 size followed by a max256
pooling layer. Then continuously upto four layers, convolutional layers with 512257
kernel filters and 3×3 size followed by a max pooling layer. Then the next two258
layers explain the fully connected (FC)layers with a size of 4096. Layers seven-259
teen and twenty indicate the batch normalization layer which splits data into260
batches.Layers eighteen and twenty one refer to ReLU(Rectifier Linear Unit),261
activation function f(x) = max(0, x) for the network. The layers nineteen and262
twenty two contain FC layers with 4096x500 size and 4096x3 size respectively.263
The last layer consists of a non-linear function called tanh.tanh(x) = 2σ(2x)−1.264
3.5.4. YOLO265
YOLO[27, 28] uses CNN for doing object detection in real-time. In YOLO266
a single neural network is applied to a full image, and then divides the same267
into regions and predicts bounding boxes with its confidence. Based on the risk268
levels of cervical spondylosis different target objects were created based on the269
angles calculated according to the law of cosine using open pose library (neck270
and hip position). In YOLO, the target object detection is done on every frame271
of the video and also calculates the duration of the continuous time spent in a272
particular position.273
11
Figure 6: CNN-Architecture
The YOLO architecture contains 23 convolutional layers, 5 max pooling lay-274
ers, two route layers, a reorg layer and a detection layer. The layers start with275
a convolutional layer with 32 kernel filters and the size is 3x3/1 followed by a276
max pooling layer with size 2x2/1. Then convolutional layer and max pooling277
layers are repeated with 64 kernel filters and the size is 3x3/1 and 2x2/1 respec-278
tively.Next three layers are convolutional layers with kernel filters 128,64 and279
128 and size with 3x3/1,1x1/1 and 3x3/1 respectively followed by a max pooling280
layer with 2x2/1 size. Again, next three layers consists of convolutional layers281
with 256,128 & 256 kernel filters and with size of 3x3/1,1x1/1 and 3x3/1 respec-282
tively and a max pooling layer of 2×2/1 size. Next five layers are convolutional283
layers, while alternative layers have kernel filters 512 and 256 respectively and284
also size 3x3/1 and 1x1/1. The last max pooling layer has 2x2/1 size followed285
by four convolutional layers with alternative kernel filter of 1024 & 512 and also286
3x3/1 & 1x1/1 respectively. The next three layers are also convolutional layers287
12
with kernel filter 1024 and size with 3x3/1 followed by one of the route lay-288
ers which indicate the action of concatenation. This layer has 16 kernel filters.289
Twenty sixth layer is a convolutional layer with 64 kernel filters and the size290
is 1x1/1. Next is a reorg layer which reshapes the feature map and decreases291
size and increases the number of channels without changing elements. This is292
succeeded by the second route layer and third last and second last layers. These293
are convolutional layers with kernel filter 1024 & 40 and with size 3x3/1 & 1x1/1294
respectively. The last layer of the architecture is known as the detection layer.295
3.6. Implementation296
The developed system collects data by using a web camera. The camera297
is placed either to the left side or right side of the person. Posture image is298
captured continuously(upto 3-4 hours). Frame rate of 17 frames per second is299
used. After recording the video, it is split into frames. The dimension of each300
frame is 640 x 480 pixels.301
The first phase is to find out the keyframe and time duration of the re-302
spective changed frames in a captured video. Keyframe is a frame which has303
changes with respect to the previous frame and time duration is the time taken304
to effect the changes in frames. For a particular time period the subject in the305
captured video is sitting straight and after a certain time the subject changes306
his/her sitting position to leaning position, and then calculates the time dura-307
tion between the sitting frame and the changed leaning frame and also identifies308
the key frame. From the three different image matching techniques, such as309
Mean Squared Error Method(MSE), Reduced Average and keypoint descriptor310
method, the keypoint descriptor method gives a more accurate result as it con-311
siders dissimilar frames. After comparing with SIFT, SURF and OR; ORB gets312
keypoint more efficiently than others. The key frames identified are used as the313
input to openpose library for post processing.314
OpenPose[20] represents the first real-time multi person system on single315
images with a total of 135 key points corresponding to the human body, hand,316
facial, and foot keypoints. Among 135 keypoints, we consider only the key317
points of neck and hip(spine) positions in this study. By using the key points318
in these positions, lean angles are computed by law of cosines.Then the images319
are tested against the model. For testing, the two angles and the time factors320
are considered.321
For classification, posture position, duration of the posture and continuity of322
the posture for a period is collected. Based on this information, posture classi-323
fication is performed to calculate the various risk levels of Cervical Spondylosis.324
For the classification task, CNN based method is shown to have better perfor-325
mance.326
4. Result Analysis327
4.1. Performance of different descriptors to obtain the best frame328
Table 2 lists performance evaluation of different keypoint detection tech-329
niques. Performance was evaluated through the efficiency of each algorithm for330
13
images with varying intensity values and augmentation of images such as rota-331
tion,scaling and sheared image. Each case is evaluated through time needed to332
extract each keypoint descriptor, number of keypoints detected in subsequent333
images/frames, number of matches and average matching rate. Overall match334
rate of ORB outperform than SURF and SIFT. Computational time require-335
ment for ORB keydescriptor is less in all the cases. Image augmentation with336
rotation ORB provides the best results rather than other two algorithms. When337
scaling, the image has been scaled twice to show the effect of matching the scal-338
ing value. The highest average match score is for SIFT and the lowest for ORB.339
The original image was sheared with value 0.5 and ORB has the highest overall340
match score.341
Table 2: Performance Evaluation
ORB SURF SIFT
Images with various intensity
Time (Sec) 0.03 0.04 0.13
Keypoint detected in first image 248 162 261
Keypoint detected in second image 229 166 267
No.of mates 183 119 168
Average match rate (%) 76.7 72.6 63.6
Image with its rotated image
Time(Sec) 0.03 0.03 0.16
Keypoint detected in first image 248 162 261
Keypoint detected in second image 260 271 423
No.of mates 166 110 158
Average match rate(%) 65.4 50.8 46.2
Image with its scaled image
Time(Sec) 0.02 0.08 0.25
Keypoint detected in first image 248 162 261
Keypoint detected in second image 1210 581 471
No.of mates 232 136 181
Average match rate(%) 31.8 36.6 49.5
Image with its sheared image
Time(Sec) 0.026 0.049 0.133
Keypoint detected in first image 298 162 261
Keypoint detected in second image 229 214 298
No.of mates 150 111 145
Average match rate(%) 62.89 59.04 51.88
4.2. Performance of different classifiers342
Classification is evaluated using different evaluation metrics. Accuracy is343
the most commonly used metrics. Other criteria include precision, sensitiv-344
ity, accuracy, False Negative Rate (FNR), False Positive Rate (FPR) and F1345
score.Precision or PPV (Positive Predictive Value) tests the positive ones cor-346
rectly identified within the positive type samples. False Negative Rate (FNR)347
is defined as the ratio of the number of negative samples wrongly reported (i.e.,348
false negatives) to the total number of positive samples actually given. Error349
rate is calculated as the ratio of the number of incorrect classifications to the350
amount of test data being evaluated and it is therefore simple to quantify with351
less difficulty. Specificity or TNR(True Negative Rate) is a calculation of the352
number of correctly classified negatives, while Sensitivity or TPR (True Positive353
Rate) or Recall measures the number of correctly classified positives. It refers to354
14
the conditional probability of accurately determining illness through a diagnos-355
tic test.The F1-score is given by the harmonic mean of sensitivity and precision356
values.The False Positive Rate (FPR) is calculated as the ratio of the number357
of wrongly reported positive tests (i.e., false positives) to the total number of358
real negatives. A classifier with greater precision, accuracy, specificity ,sensi-359
tivity, NPV and F1-score is considered to be more efficient. These metrics are360
illustrated in the table 3.361
Table 3: Evaluation Metrics
Metrics Formula
Sensitivity or recall TPR = TP / (TP + FN)
Specificity SPC = TN / (FP + TN)
Precision PPV = TP / (TP + FP)
False Positive Rate FPR = FP / (FP + TN)
False Discovery Rate FDR = FP / (FP + TP)
False Negative Rate FNR = FN / (FN + TP)
Accuracy ACC = (TP + TN) / (P + N)
F1 Score F1 = 2TP / (2TP + FP + FN)
4.2.1. KNN362
Performance of the KNN is measured through two different distance mea-363
sures such as Euclidean and Manhattan and is described in the Table 4.This364
result is obtained through the value of k=7 and observed that both distance365
gives very near result for recall ie,0.7000 for Euclidean and 0.6977 for Manhat-366
tan distance. While Euclidean distance has better performance in accuracy and367
precision.368
Table 4: KNN Performance Evaluation
Accuracy Precision Recall
Euclidean distance 0.7115 0.7778 0.7000
Manhattan distance 0.6982 0.7059 0.6977
4.2.2. SVM369
Two different kernels were tried with SVM and observed that Gaussian has370
better accuracy of 0.8039 and recall of 0.8400 than RBF kernel. While RBF has371
slight improvement in precision of 0.7895 than Gaussian kernel of 0.7778.372
Table 5: SVM Performance Evaluation
Accuracy Precision Recall
Gaussian Kernel 0.8039 0.7778 0.8400
RBF Kernel 0.7867 0.7895 0.7895
15
4.2.3. Deep learning373
In deep learning we compared the performance of CNN with YOLO al-374
gorithm.The performance was evaluated through two stages such as with and375
without hyper parameter tuning and is shown in table 6. YOLO has narrow376
difference of better result in precision of 83.09% when we compared with the377
CNN of 82.77% in without hyper parameter tuning. While in case of with hyper378
parameter tuning, the CNN has remarkable results compared with YOLO. And379
this result is obtained at an epoch of 6000 and is illustrated in the figure 7.380
Table 6: Performance Evaluation
Accuracy Precision Recall
With hyper-parameter
CNN 0.8800 0.9200 0.8519
Yolo 0.8700 0.9000 0.8491
Without hyper-parameter
CNN 0.8613 0.8277 0.8224
Yolo 0.8210 0.8309 0.7734
Figure 7: Accuracy vs epochs
4.3. Overall Performance381
Table7 shows the performance of the various machine learning models under382
consideration. We have CNN and YOLO in deep learning and also SVM and383
K-NN in shallow learning.When we compare CNN, YOLO, SVM and K-NN;384
CNN is more accurate.385
4.3.1. CNN386
CNN[23] is the most popular deep learning architecture. CNN along with387
openpose library gives more accuracy within the limited time duration in the388
present study. CNN also outperformed in performance metrics. In Accuracy,389
Precision, Recall(sensitivity), F1-Score and specificity, the highest value of per-390
formance metrics are as follows 0.8800, 0.9200, 0.8519, 0.8846, 0.9130. Also, the391
FPR, FDR and FNR(0.0870, 0.0800 & 0.1481 ) were the lowest value, indicating392
16
a good result. As accuracy, precision, recall and F1-score were high, the result393
corresponding to the algorithm can be considered to be satisfactory.394
4.3.2. YOLO395
Yolo is a real time object detection system.It also comes under deep learn-396
ing.Compared to CNN its value is low but in this study, YOLO with SVM and397
K-NN showed high values. YOLO showed high values in accuracy, precision,398
recall, f1-score and specificity as 0.8700 ,0.9000 0.8491, 0.8738 & 0.8491 respec-399
tively.Here also the lowest valued components FPR, FDR and FNR have lower400
value compared to SVM and K-NN.401
4.3.3. SVM402
SVM belongs to the general category of kernel methods. SVM is a shallow403
learning method algorithm.When it is compared with deep learning algorithms404
the performance of SVM is low.But,it is better than K-NN.405
Table 7: Performance Metrics
Algorithm Accuracy Precision Recall F1-score Specificity FPR FDR FNR
CNN 0.8800 0.9200 0.8519 0.8846 0.9130 0.0870 0.0800 0.1481
Yolo 0.8700 0.9000 0.8491 0.8738 0.8491 0.1509 0.1000 0.1509
SVM 0.8039 0.7778 0.8400 0.8077 0.7692 0.2308 0.2222 0.1600
K-NN 0.7115 0.7778 0.7000 0.7368 0.7273 0.2727 0.2222 0.3000
5. Conclusions406
This paper summaries the design and development of a system to detect in-407
correct posture for detecting cervical spondylosis in initial stages. Experiments408
were performed to test the proposed system and is found to have appreciable409
accuracy. Datasets were acquired through a web camera and the necessary pre-410
processing steps are performed to enhance the quality of the same. The system411
was trained to distinguish between four classes related to cervical spondylo-412
sis using pre-recorded data. The advantage associated with the system is that413
the end users can evaluate the correctness of their posture in real time. This414
classification helps to find the risk level of CS in a tranquil manner.415
The system can be also enhanced through processing classification in a par-416
allel way. To improve the accuracy of the system, modified deep learning algo-417
rithms can be made use of. The idea of the proposed assistive system can also418
be extended to a low cost device.419
References420
[1] F. Lees, J. A. Turner, Natural history and prognosis of cervical spondylosis,421
British medical journal 2 (5373) (1963) 1607.422
17
[2] A. I. Binder, Cervical spondylosis and neck pain, Bmj 334 (7592) (2007)423
527–531.424
[3] D. Glew, I. Watt, P. Dieppe, P. Goddard, Mri of the cervical spine: rheuma-425
toid arthritis compared with cervical spondylosis, Clinical radiology 44 (2)426
(1991) 71–76.427
[4] R. A. Deyo, S. K. Mirza, B. I. Martin, Back pain prevalence and visit rates:428
estimates from us national surveys, 2002, Spine 31 (23) (2006) 2724–2727.429
[5] R. A. Deyo, S. K. Mirza, B. I. Martin, Back pain prevalence and visit rates:430
estimates from us national surveys, 2002, Spine 31 (23) (2006) 2724–2727.431
[6] L. Brain, M. Wilkinson, Cervical spondylosis and other disorders of the432
cervical spine, Butterworth-Heinemann, 2013.433
[7] C. Heller, P. Stanley, B. Lewis-Jones, R. Heller, Value of x ray examinations434
of the cervical spine., Br Med J (Clin Res Ed) 287 (6401) (1983) 1276–1278.435
[8] S. A. Olarinoye-Akorede, P. O. Ibinaiye, A. Akano, A. U. Hamidu, G. A.436
Kajogbola, et al., Magnetic resonance imaging findings in cervical spondy-437
losis and cervical spondylotic myelopathy in zaria, northern nigeria, Sub-438
Saharan African Journal of Medicine 2 (2) (2015) 74.439
[9] D. B. Nunez Jr, A. Zuluaga, D. A. Fuentes-Bernardo, L. A. Rivas, J. L.440
Becerra, Cervical spine trauma: how much more do we learn by routinely441
using helical ct?, Radiographics 16 (6) (1996) 1307–1318.442
[10] P. P. Chitte, U. M. Gokhale, Analysis of different methods for identifica-443
tion and classification of cervical spondylosis (cs): A survey, International444
Journal of Applied Engineering Research 12 (21) (2017) 11727–11737.445
[11] K. Hirano, K. Shoda, K. Kitamura, Y. Miyazaki, Y. Nishida, Method for446
behavior normalization to enable comparative understanding of interactions447
of elderly persons with consumer products using a behavior video database,448
Procedia Computer Science 160 (2019) 409–416.449
[12] M. Ariz, A. Villanueva, R. Cabeza, Robust and accurate 2d-tracking-based450
3d positioning method: Application to head pose estimation, Computer451
Vision and Image Understanding 180 (2019) 13–22.452
[13] E. Ramirez, P. Melin, G. Prado-Arechiga, Hybrid model based on neural453
networks, type-1 and type-2 fuzzy systems for 2-lead cardiac arrhythmia454
classification, Expert Systems with Applications 126 (2019) 295–307.455
[14] I. Miramontes, J. C. Guzman, P. Melin, G. Prado-Arechiga, Optimal design456
of interval type-2 fuzzy heart rate level classification systems using the bird457
swarm algorithm, Algorithms 11 (12) (2018) 206.458
18
[15] P. Melin, I. Miramontes, G. Prado-Arechiga, A hybrid model based on459
modular neural networks and fuzzy systems for classification of blood pres-460
sure and hypertension risk diagnosis, Expert Systems with Applications461
107 (2018) 146–164.462
[16] O. Castillo, P. Melin, E. Ram´ırez, J. Soria, Hybrid intelligent system for463
cardiac arrhythmia classification with fuzzy k-nearest neighbors and neural464
networks combined with a fuzzy system, Expert Systems with Applications465
39 (3) (2012) 2947–2955.466
[17] D. G. Lowe, Object recognition from local scale-invariant features, in: Pro-467
ceedings of the Seventh IEEE International Conference on Computer Vi-468
sion, Vol. 2, 1999, pp. 1150–1157 vol.2.469
[18] H. Bay, T. Tuytelaars, L. Van Gool, Surf: Speeded up robust features, in:470
A. Leonardis, H. Bischof, A. Pinz (Eds.), Computer Vision – ECCV 2006,471
Springer Berlin Heidelberg, Berlin, Heidelberg, 2006, pp. 404–417.472
[19] E. Rublee, V. Rabaud, K. Konolige, G. Bradski, Orb: An efficient alterna-473
tive to sift or surf, in: 2011 International Conference on Computer Vision,474
2011, pp. 2564–2571.475
[20] Z. Cao, G. Hidalgo, T. Simon, S.-E. Wei, Y. Sheikh, OpenPose: real-476
time multi-person 2D pose estimation using Part Affinity Fields, in: arXiv477
preprint arXiv:1812.08008, 2018.478
[21] D. Wackerly, W. Mendenhall, R. L. Scheaffer, Mathematical statistics with479
applications, Cengage Learning, 2014.480
[22] I. P˘av˘aloi, A. Ignat, Iris image classification using sift features, Procedia481
Computer Science 159 (2019) 241–250.482
[23] N. S. A. ALEnezi, A method of skin disease detection using image process-483
ing and machine learning, Procedia Computer Science 163 (2019) 85–92.484
[24] N. S. Altman, An introduction to kernel and nearest-neighbor nonpara-485
metric regression, The American Statistician 46 (3) (1992) 175–185. doi:486
10.1080/00031305.1992.10475879.487
[25] C. Cortes, V. Vapnik, Support-vector networks, Machine learning 20 (3)488
(1995) 273–297.489
[26] Y. LeCun, Y. Bengio, et al., Convolutional networks for images, speech, and490
time series, The handbook of brain theory and neural networks 3361 (10)491
(1995) 1995.492
[27] D. T. Nguyen, T. N. Nguyen, H. Kim, H.-J. Lee, A high-throughput and493
power-efficient fpga implementation of yolo cnn for object detection, IEEE494
Transactions on Very Large Scale Integration (VLSI) Systems 27 (8) (2019)495
1861–1873.496
19
[28] Y. Tian, G. Yang, Z. Wang, H. Wang, E. Li, Z. Liang, Apple detection dur-497
ing different growth stages in orchards using the improved yolo-v3 model,498
Computers and electronics in agriculture 157 (2019) 417–426.499
20