ArticlePDF Available

Comparative analysis of Machine Learning approaches for early stage Cervical Spondylosis detection

Authors:
  • St. George's College Aruvithura

Abstract and Figures

Cervical Spondylosis(CS) is a chronic spinal condition in which the spine gradually stiffens and can finally become completely inflexible. It is arduous to diagnose in early stages and leads to delay in medication. The risk level of cervical spondylosis can be reduced if it is detected in primary care. Based on this objective, a system is designed and developed to diagnose and predict the severity of cervical spondylosis in early stages. Different machine learning techniques are evaluated for this and results indicate that machine learning techniques can provide a low cost and accurate mechanism for early stage spondylosis detection.
Content may be subject to copyright.
Journal Pre-proofs
Comparative analysis of Machine Learning approaches for early stage Cervi‐
cal Spondylosis detection
M. Sreeraj, Jestin Joy, Manu Jose, Meenu Varghese, T.J. Rejoice
PII: S1319-1578(20)30448-1
DOI: https://doi.org/10.1016/j.jksuci.2020.08.010
Reference: JKSUCI 832
To appear in: Journal of King Saud University - Computer and
Information Sciences
Received Date: 17 April 2020
Accepted Date: 19 August 2020
Please cite this article as: Sreeraj, M., Joy, J., Jose, M., Varghese, M., Rejoice, T.J., Comparative analysis of
Machine Learning approaches for early stage Cervical Spondylosis detection, Journal of King Saud University -
Computer and Information Sciences (2020), doi: https://doi.org/10.1016/j.jksuci.2020.08.010
This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover
page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version
will undergo additional copyediting, typesetting and review before it is published in its final form, but we are
providing this version to give early visibility of the article. Please note that, during the production process, errors
may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
© 2020 Production and hosting by Elsevier B.V. on behalf of King Saud University.
Comparative analysis of Machine Learning approaches
for early stage Cervical Spondylosis detection
Abstract
Cervical Spondylosis(CS) is a chronic spinal condition in which the spine grad-
ually stiffens and can finally become completely inflexible. It is arduous to
diagnose in early stages and leads to delay in medication. The risk level of cer-
vical spondylosis can be reduced if it is detected in primary care. Based on this
objective, a system is designed and developed to diagnose and predict the sever-
ity of cervical spondylosis in early stages. Different machine learning techniques
are evaluated for this and results indicate that machine learning techniques can
provide a low cost and accurate mechanism for early stage spondylosis detection.
Keywords: Cervical Spondylosis(CS), CNN, ORB, Sitting posture, Machine
Learning.
1. Introduction1
A common cause of spinal cord dysfunction in elderly persons is Cervical2
Spondylosis (CS). Lifestyle, age, occupation and many other factors can be the3
reason for rapid growth of spine related diseases. Early detection followed by4
medication can help to reduce the risk level of the disease. Technology to detect5
cervical spondylosis in early stages is nonexistent, since it is very difficult to6
diagnose in primary stage.7
A chronic deterioration of the vertebrae and discs of the neck results in8
Cervical Spondylosis(CS). The bony projections that are seen along joints are9
referred to as bone spurs or osteophytes. The disease is also often linked with10
arthritis[1, 2, 3]. In cervical spondylosis, the degree of pain can differ from11
individual to individual. By practising a healthy lifestyle, one can reduce the12
risk of CS.13
According to the recent study[4] by the Office for National Statistics[5], the14
count of people with lower back pain and neck pain is growing everyday. This15
affects their ability to do work. From reports, around 31% men and 20% of16
women are facing the issue. Lower back pain is a problem faced by employees17
who sit for long hours in a particular posture. This reduces the utility of their18
ability to work efficiently. Timely detection of such bad postures which cause19
lower back pain helps to diagnose and treat the problem. Recent studies proved20
that sitting habits have great significance in physical health. There are different21
measuring tools used to evaluate neck pain caused by cervical spondylosis. But22
Preprint submitted to Elsevier August 8, 2020
these systems take more time to detect and have less accuracy. This calls for23
a better solution to detect and predict the risk level of cervical spondylosis. A24
low cost, fast and efficient system is proposed in this paper.25
The system presents a detection method using data from a camera. The26
overall working of the system is as follows. A camera is placed either to the27
right or left side of the person, and it continuously takes video of the person28
sitting in a chair. The camera takes the video around 3-4 hours and splits the29
video into frames. After splitting the frames, similarity between the images30
are calculated. If a frame occurs more than 60%, that frame will be selected3 1
for further processing. The selected frame is tested using the machine learning32
model. The images are classified into four different classes - normal, mild,33
average and high risk levels. Key points are identified and calculated and that34
of cervical spine and lower back are extracted. Angles calculated from these are35
used for classification.36
The proposed system can be used by general public to find out the chances of37
getting cervical spondylosis in early stages by inputting a video of their sitting38
posture. The system predicts the level of risk involved by making use of machine39
learning algorithms. Different algorithms based on shallow learning and deep40
learning techniques are evaluated for this purpose.41
2. Literature Review42
Cervical Spondylosis (CS) is usually asymptomatic, but may be present with43
symptoms like neck pain, stiffness or even shoulder pain and severe joint pain of44
arms as well as tactile sensations. This points to age-related disc degenerative45
chronic CS. Neck pain is one of the most common issues among patients with46
CS. Clinical examination, spinal angiography, X-ray, MRI scan and computer47
tomography[6] are the present diagnostic methods used. Among these methods,48
x-ray is widely used, because it is cost effective and has low radiation. But49
the main issue with x-ray evaluation is the low accuracy. It depends on the50
experience and knowledge of the clinicians[7]. So different clinicians may give51
different clinical reports based on the x-ray analysis, and it is difficult to diagnose52
the risk level of CS.53
MRI images can be used to diagnose CS. MRI is different from x-ray because54
it does not cause radiation. The computer gives cross-sectional binary images of55
the body that are converted into three-dimensional (3-D) images of the scanned56
area. This helps to pinpoint problems in the cervical spine when the scan focuses57
on that area. MRI images show the details of soft tissues such as cartilages and58
nerve roots. This test can show spinal compressions more clearly than x-rays.An59
MRI scan of the cervical spine is used only if the pain persists even after normal60
treatment. The only drawback with the MRI is that it takes around 30-4561
minutes. Other issues are some preconditions. Before performing an MRI scan,62
the doctor must ensure that the person is not diabetic, does not have kidney63
problems, or not in the first trimester of pregnancy. In these cases the doctor64
should take extra care[8].65
2
A cervical spine CT scan[9] is a medical tool for developing a visual image of66
the cervical spine using advanced X-ray equipment and computer imaging. The67
cervical spine is the part of the spine passing across your neck. Despite this,68
the examination is often called a neck CT scan. When a person has important69
information Via testing bone density, certain bone disorders, such as arthritis or70
CS, will help a doctor assess the severity of the disorder and classify it. A normal71
X-ray brings a small amount of radiation into the body of the patient. Bones72
and soft tissue absorbs radiation differently, so that they appear on the X-ray73
film in various colours. A CT scan works similarly but many X-rays are taken74
in a spiral fashion instead of one flat image. It provides more information and75
accuracy. When the patient is inside the scanner, several X-ray beams pass in76
a circular motion across the upper torso and neck as electronic X-ray detectors77
monitor the radiation that the body absorbs. This information is interpreted by78
a computer to produce separate images, called slices. These are then combined79
to create a cervical spine 3-D model.80
Computer technology has its advantages such as early stage diagnosis, less81
time and effort, cost effectiveness, higher efficiency and accuracy. Machine82
Learning based algorithms[10] for detecting Cervical Spondylosis are discussed83
in literature. Most of the methods use MRI images for classification.84
Kei Hirano et.al[11] proposed a novel approach for quantitative analysis of85
the relationships of elderly patients and consumer goods. The authors have86
introduced three functions: robust Pose estimation, standardization and clus-87
tering process. It specifically supports elderly people with physical and cognitive88
ability impairment (e.g. dementia). Mikel Ariz et.al[12] proposed a method for89
head pose estimation using 2D tracking of the face and also enhancing 2D point90
tracking and 3D pose estimation. The baseline form for pose estimation is ex-91
ploited and a novel weighted variant of POSIT algorithm is proposed in this92
work.93
Eduardo Ramirez et.al[13] proposed a hybrid model as classification method94
for 2-lead cardiac arrhythmias. Artificial neural networks and fuzzy logic is used95
to develop this system. Ivette Miramontes et.al[14] describe optimal design of96
type-1 and interval type-2 fuzzy systems. Type-1 fuzzy systems are designed97
and optimized with trapezoidal membership function and second by Gaussian98
membership function. Crow search algorithm and Bird swan algorithm were99
compared for performance comparison. P. Melin, I. Miramontes et.al[15] pro-100
posed a hybrid model using modular networks and a fuzzy system developed for101
the hypertension risk diagnosis. The modular network shows a learning accu-102
racy of 98%, 97.62% and 97.83% in first, second and third modules respectively.103
O. Castillo, P. Melin et.al[16] explained about the hybrid intelligent system for104
arrhythmia classification. They tried the combination of fuzzy KNN with neural105
network with Mamdani fuzzy system. The methods used for classification were106
Fuzzy K-Nearest Neighbors, Multi Layer Perceptron with Gradient Descent and107
momentum and Multi Layer Perceptron with Scaled Conjugate Gradient Back-108
propagation. 98% accuracy was obtained using Mamdani type fuzzy inference109
system.110
3
3. Design and Implementation111
The proposed system is a patient assistive system that can be used to detect112
the onset of cervical spondylosis in early stages. Both shallow learning and deep113
learning techniques are evaluated for this.114
The captured videos of the individual is used to find out the115
chances of getting cervical spondylosis. 72 volunteers were recruited116
for collecting data. They were divided into four classes containing 18117
persons each. Individuals in each of these four classes were asked to118
sit in the prescribed position - normal, mild, average and high, which119
denotes the chances of getting cervical spondylosis. Their positions120
were evaluated by a medical practitioner who is an expert in the field121
of cervical spondylosis. From these four classes, a 6-fold cross valida-122
tion method was used for evaluation. That is 83.33% (60 videos) for123
training purpose and the remaining 16.66%(12 videos) for testing.124
The block diagram of the proposed system is shown in Figure 1.125
Figure 1: Block diagram of the proposed system
The main objective of the first method is to find the best frame- the frame126
which has the capability of giving the sitting position of a person- based on127
the time duration a person sits in a specific posture. To obtain the best frame,128
video is captured and is split into different frames that contain the key frames.129
Different methods like Mean Squared Error Method (MSE), Reduced Average130
Method and Keypoint Descriptor Method are evaluated for this. Best perfor-131
mance is obtained by Keypoint Descriptor Methods such as SIFT[17], SURF[18]132
and ORB[19].133
4
Keypoints are detected from Part Affinity Fields (PAFs)[20] representation.134
It helps to associate body parts with human image. This method can easily135
identify key points in hand, foot and body.136
Cervical Spondylosis has four different stages, from normal to severe. Based137
on these risk levels, different target objects are created based on the angles cal-138
culated according to the law of cosine using key points obtained from previous139
stages. In shallow learning method, K-NN and SVM classifiers are studied. In140
Deep Learning, YOLO and CNN are considered. In YOLO, the target object141
detection is done on every frame of the video and also calculates the duration of142
the continuous time spent on a particular position. In CNN based approach, ev-143
ery frame is passed to find out the key points and the classification is performed144
on that.145
3.1. Mean Squared Error Method(MSE)146
MSE[21] is used to select the most repeating frame in a particular interval147
of time. At first, the starting frame is selected as the pivot element. If the148
person is sitting, it is assumed that the sitting position angle is 90°, and when149
the sitting angle changes to 60°- 65°an error occurs in frames. So the MSE150
between the first frame and second frame is computed. If there is no error or151
a small error is found, then the first frame and the third frame are checked,152
calculated and this process continues. When a major change occurs in frames153
then that particular frame is set as the pivot element.154
MSE =1
n
n
X
i=1
(yi˜y)2(1)
Likewise, the entire frames are subjected to MSE computation. Based on this155
error value, a pattern is obtained. This is shown in Figure 2 The main issue of156
this method is that it is sensitive to outliers. The disadvantage of MSE can be157
overcome by using Minimum Mean Square Error (MMSE) or Reduced Average158
Method (RAM).159
3.2. Reduced Average Method160
The key frames obtained using MSE are also not accurate, as MSE is sensitive161
to outliers. It contains frames with both very high error value and very low162
error value. By using Reduced Average Method these frames can be avoided.163
After avoiding these frames, a set of frames with a range of error values can be164
obtained. For getting a feasible solution, the average is computed. Based on165
these values, the reduced average stage-1 is illustrated in Figure 3.166
In this technique the issue is an outlier with repeated patterns .Thus it is167
difficult to find out the most repeating frame from the pattern.168
5
Figure 2: Pattern generated after implementing MSE
Figure 3: Reduced average-stage1
From the illustration in Figure 3, it is observed that there exists some peak169
values which can be avoided through normalization. By normalization the num-170
ber of frames are calculated using maximum error and minimum error. The171
6
errors are distributed into different numbers of frames as follows:172
No.off rames =M ax.error Min.error
100 (2)
From the frames, the key frame is selected which has the highest frequency173
of error. The most repeating frames have a similar range of errors and are174
illustrated in Figure 4 which gives a feasible solution but not optimal. The175
frames are selected and subjected to further processing.
Figure 4: Reduced average-stage2
176
3.3. Keypoint Descriptor Methods177
This method considers dissimilar frames for computation. There are different178
keypoint descriptor methods like SIFT[17], SURF[18] and ORB[19]. Our studies179
have found that this is the most feasible method for finding the similarities180
between the frames.181
3.3.1. ORB182
Oriented FAST and Rotated BRIEF (ORB)[19] is a fusion of FAST key-183
point detector and BRIEF descriptor with many modifications to enhance the184
performance. Top N points of the keypoints are obtained by applying the Harris185
corner measure upon the keypoints found through FAST which results in multi-186
scale features. The limitation of FAST is that it doesn’t compute orientation.187
To improve the rotation invariance, moments are computed with xand ywhich188
should be in a circular region of radius r, where ris the size of the patch. ORB189
feature is used to find the similarity between the images. First the video is split190
into frames. These frames are subjected to ORB to find the similar images.191
7
The reason for finding the similar images is that,when a person is sitting for a192
long time in a similar posture, it is probably the comfortable posture for that193
person.194
3.3.2. SIFT195
The SIFT[17, 22] algorithm mainly consists of four steps. In the first step,196
the Difference of Gaussian (DoG) method is used to estimate a scale space197
extreme followed by a key point localization which is performed in the second198
step. A refinement is also done in the same scene to eliminate the low contrast199
points. In the third step, the key point orientation is done upon local image200
gradients. A descriptor generator is computed in the final step which is the201
local image descriptor for each generated key point. The descriptor generated202
is a function of the gradient magnitude and orientation.203
3.3.3. SURF204
In SURF[18] technique, the Difference of Gaussian (DoG) is approximated205
with the aid of box filters. The technique uses squares for approximating, as206
the convolution is faster while using integral images. This process can also be207
approached in a parallel manner.208
3.4. Keypoint detection209
Keypoint detection for finding the features(angle deviation corresponding to210
hip and neck) with respect to the sitting position of the person in key frame is211
described in Algorithm 1.212
Keypoints labelling corresponding to the human is described in the Table 1.213
Figure 5: Keypoints Labeling
8
Data: Keyframes
Result: Angles corresponding to hip and neck
while not at end of this frame do
Step 1:Preprocessing stage:;
Convert the image from [0, 255] to [-1, 1]
img = img * (2.0 / 255.0) — 1.0;
Step 2 :Pass the image through Neural Network.The output for this is a heat map
matrix and a Part Affinity Fields matrix;
while not at end of connection do
Step 3: Non Maximum Suppression: Here we have to detect the parts in the
image.For that extract parts locations out of a heatmap such as ‘The local
maximums’.For Local maximus apply a non-maximum suppression (NMS)
algorithm;
3.1 Launch the heatmap at first pixel.;
3.2 Cover the pixel with a side 5 window to locate the max value in the area.;
3.3 Substitute the center-pixel value with the maximum;
3.4 Move/Stride the window one pixel and and repeat the steps until we have
filled the entire heatmap.;
3.5 The output is contrasted with the initial heatmap. Those pixels that
remain the same value are the points that we’re searching for. Replace all
pixels bringing them to a value of 0.;
Step 4:Generate complete bipartite graph;
Step 5:Applies Line Integral;
Step 6:Generates a bipartite weighted graph;
Step 7:Implement Assignment Algorithm;
7.1 Filter by the score of each potential connection;
7.2 The connection to the highest score is a conclusive connection indeed;
7.3 Going to contact as fast as possible. If no portions of this connection were
previously attributed to a final connection that is a final connection;
7.4 Repeat step 3, until finished.
end
Step 8:Merging : transform these detected connections into the final skeletons;
Step 9:key points are identified;
Step 10:two(neck & hip) of them were selected;
Step 11:Angle were calculated according to law of cosines;
Step 12:Angle is converted to vector along with each frame information;
end
Algorithm 1: Angle findings through open pose
9
Table 1: Keypoints Labeling
Keypoint Body part
0 Nose
1 Neck
2 Right Shoulder
3 Right Elbow
4 Right Wrist
5 Left Shoulder
6 Left Elbow
7 Left Wrist
8 Right Hip
9 Right Knee
10 Right Ankle
11 Left Hip
12 Left Knee
13 Left Ankle
3.5. Classification214
After keypoints are detected, it is fed to a classification algorithm to find the215
problematic frame and evaluated using different machine learning algorithms.216
K-NN and SVM[23] Classifiers were used to classify frames. K-NN is based on217
feature similarity. The SVM classifier works on a wide range of classification218
problems which are high dimensional in nature. However, SVM requires fine219
tuning of the key parameters to achieve good accuracy in classification.220
3.5.1. K-NN Classification221
K-nearest Neighbors[24] is a lazy algorithm that stores all instances and clas-222
sifies unknown instances based on a similarity measure. KNN has been widely223
used in pattern recognition and estimation problems. KNN makes predictions224
for a new instance xby searching through the entire stored instances for the225
K-most similar instances and assigns the same to the unknown instance. The226
similarity is measured through different distance measures like Euclidean dis-227
tance and Manhattan distance.228
Euclidean =v
u
u
t
k
X
i=1
(xiyi)2(3)
229
Manhattan =v
u
u
t
k
X
i=1
|xiyi|(4)
3.5.2. SVM Classifier230
A Support Vector Machine (SVM)[25] constructs a hyperplane or set of231
high or infinite-dimensional spaces that can be used for classification or regres-232
10
sion. Intuitively, the hyperplane which has the greatest distance to the closest233
training data point of any class (so-called functional margin) achieves a strong234
separation, because in general the greater the margin the lower the classifier’s235
generalization error. SVM uses kernel methods to address non-linear problems.236
A kernel method is an algorithm which only relies on the data by dot-products.237
If this is the case, a kernel function which calculates a dot product in some prob-238
ably high-dimensional feature space will replace the dot product. The kernel239
methods help to generate non-linear decision boundaries using linear classifier240
methods. They also help the user to add data that does not have a clear illus-241
tration of the fixed-dimensional vector space.242
Gaussian, K(x, y) = exp(||xy||2
2σ2) (5)
3.5.3. CNN based classification243
CNN[26] is the most popular deep learning architecture. Our study has found244
that CNN can easily classify keypoint frames as compared to KNN and SVM.245
Every node of the neural network has their own sphere of knowledge about rules246
and functionalities to develop itself through experiences learned from previous247
techniques.248
The CNN architecture used is shown in Figure 6. The CNN architecture249
consists of a total 23 layers. Out of it 10 are convolutional layers. 4 layers are250
maxpool layers, 4 fully connected layers, 2 batch normalization layers, 2 ReLu251
layers, and one tanh layer. The first and second layer is a convolutional layer252
with 64 kernel filters of 3×3 size. Followed by a max pooling layer. Then the253
fourth and fifth layer is again a convolutional layer with 128 kernel filters of254
3×3 size. Sixth layer is a max pooling layer. Repeatedly seventh and eighth be-255
come convolutional layers with 256 kernel filters and 3×3 size followed by a max256
pooling layer. Then continuously upto four layers, convolutional layers with 512257
kernel filters and 3×3 size followed by a max pooling layer. Then the next two258
layers explain the fully connected (FC)layers with a size of 4096. Layers seven-259
teen and twenty indicate the batch normalization layer which splits data into260
batches.Layers eighteen and twenty one refer to ReLU(Rectifier Linear Unit),261
activation function f(x) = max(0, x) for the network. The layers nineteen and262
twenty two contain FC layers with 4096x500 size and 4096x3 size respectively.263
The last layer consists of a non-linear function called tanh.tanh(x) = 2σ(2x)1.264
3.5.4. YOLO265
YOLO[27, 28] uses CNN for doing object detection in real-time. In YOLO266
a single neural network is applied to a full image, and then divides the same267
into regions and predicts bounding boxes with its confidence. Based on the risk268
levels of cervical spondylosis different target objects were created based on the269
angles calculated according to the law of cosine using open pose library (neck270
and hip position). In YOLO, the target object detection is done on every frame271
of the video and also calculates the duration of the continuous time spent in a272
particular position.273
11
Figure 6: CNN-Architecture
The YOLO architecture contains 23 convolutional layers, 5 max pooling lay-274
ers, two route layers, a reorg layer and a detection layer. The layers start with275
a convolutional layer with 32 kernel filters and the size is 3x3/1 followed by a276
max pooling layer with size 2x2/1. Then convolutional layer and max pooling277
layers are repeated with 64 kernel filters and the size is 3x3/1 and 2x2/1 respec-278
tively.Next three layers are convolutional layers with kernel filters 128,64 and279
128 and size with 3x3/1,1x1/1 and 3x3/1 respectively followed by a max pooling280
layer with 2x2/1 size. Again, next three layers consists of convolutional layers281
with 256,128 & 256 kernel filters and with size of 3x3/1,1x1/1 and 3x3/1 respec-282
tively and a max pooling layer of 2×2/1 size. Next five layers are convolutional283
layers, while alternative layers have kernel filters 512 and 256 respectively and284
also size 3x3/1 and 1x1/1. The last max pooling layer has 2x2/1 size followed285
by four convolutional layers with alternative kernel filter of 1024 & 512 and also286
3x3/1 & 1x1/1 respectively. The next three layers are also convolutional layers287
12
with kernel filter 1024 and size with 3x3/1 followed by one of the route lay-288
ers which indicate the action of concatenation. This layer has 16 kernel filters.289
Twenty sixth layer is a convolutional layer with 64 kernel filters and the size290
is 1x1/1. Next is a reorg layer which reshapes the feature map and decreases291
size and increases the number of channels without changing elements. This is292
succeeded by the second route layer and third last and second last layers. These293
are convolutional layers with kernel filter 1024 & 40 and with size 3x3/1 & 1x1/1294
respectively. The last layer of the architecture is known as the detection layer.295
3.6. Implementation296
The developed system collects data by using a web camera. The camera297
is placed either to the left side or right side of the person. Posture image is298
captured continuously(upto 3-4 hours). Frame rate of 17 frames per second is299
used. After recording the video, it is split into frames. The dimension of each300
frame is 640 x 480 pixels.301
The first phase is to find out the keyframe and time duration of the re-302
spective changed frames in a captured video. Keyframe is a frame which has303
changes with respect to the previous frame and time duration is the time taken304
to effect the changes in frames. For a particular time period the subject in the305
captured video is sitting straight and after a certain time the subject changes306
his/her sitting position to leaning position, and then calculates the time dura-307
tion between the sitting frame and the changed leaning frame and also identifies308
the key frame. From the three different image matching techniques, such as309
Mean Squared Error Method(MSE), Reduced Average and keypoint descriptor310
method, the keypoint descriptor method gives a more accurate result as it con-311
siders dissimilar frames. After comparing with SIFT, SURF and OR; ORB gets312
keypoint more efficiently than others. The key frames identified are used as the313
input to openpose library for post processing.314
OpenPose[20] represents the first real-time multi person system on single315
images with a total of 135 key points corresponding to the human body, hand,316
facial, and foot keypoints. Among 135 keypoints, we consider only the key317
points of neck and hip(spine) positions in this study. By using the key points318
in these positions, lean angles are computed by law of cosines.Then the images319
are tested against the model. For testing, the two angles and the time factors320
are considered.321
For classification, posture position, duration of the posture and continuity of322
the posture for a period is collected. Based on this information, posture classi-323
fication is performed to calculate the various risk levels of Cervical Spondylosis.324
For the classification task, CNN based method is shown to have better perfor-325
mance.326
4. Result Analysis327
4.1. Performance of different descriptors to obtain the best frame328
Table 2 lists performance evaluation of different keypoint detection tech-329
niques. Performance was evaluated through the efficiency of each algorithm for330
13
images with varying intensity values and augmentation of images such as rota-331
tion,scaling and sheared image. Each case is evaluated through time needed to332
extract each keypoint descriptor, number of keypoints detected in subsequent333
images/frames, number of matches and average matching rate. Overall match334
rate of ORB outperform than SURF and SIFT. Computational time require-335
ment for ORB keydescriptor is less in all the cases. Image augmentation with336
rotation ORB provides the best results rather than other two algorithms. When337
scaling, the image has been scaled twice to show the effect of matching the scal-338
ing value. The highest average match score is for SIFT and the lowest for ORB.339
The original image was sheared with value 0.5 and ORB has the highest overall340
match score.341
Table 2: Performance Evaluation
ORB SURF SIFT
Images with various intensity
Time (Sec) 0.03 0.04 0.13
Keypoint detected in first image 248 162 261
Keypoint detected in second image 229 166 267
No.of mates 183 119 168
Average match rate (%) 76.7 72.6 63.6
Image with its rotated image
Time(Sec) 0.03 0.03 0.16
Keypoint detected in first image 248 162 261
Keypoint detected in second image 260 271 423
No.of mates 166 110 158
Average match rate(%) 65.4 50.8 46.2
Image with its scaled image
Time(Sec) 0.02 0.08 0.25
Keypoint detected in first image 248 162 261
Keypoint detected in second image 1210 581 471
No.of mates 232 136 181
Average match rate(%) 31.8 36.6 49.5
Image with its sheared image
Time(Sec) 0.026 0.049 0.133
Keypoint detected in first image 298 162 261
Keypoint detected in second image 229 214 298
No.of mates 150 111 145
Average match rate(%) 62.89 59.04 51.88
4.2. Performance of different classifiers342
Classification is evaluated using different evaluation metrics. Accuracy is343
the most commonly used metrics. Other criteria include precision, sensitiv-344
ity, accuracy, False Negative Rate (FNR), False Positive Rate (FPR) and F1345
score.Precision or PPV (Positive Predictive Value) tests the positive ones cor-346
rectly identified within the positive type samples. False Negative Rate (FNR)347
is defined as the ratio of the number of negative samples wrongly reported (i.e.,348
false negatives) to the total number of positive samples actually given. Error349
rate is calculated as the ratio of the number of incorrect classifications to the350
amount of test data being evaluated and it is therefore simple to quantify with351
less difficulty. Specificity or TNR(True Negative Rate) is a calculation of the352
number of correctly classified negatives, while Sensitivity or TPR (True Positive353
Rate) or Recall measures the number of correctly classified positives. It refers to354
14
the conditional probability of accurately determining illness through a diagnos-355
tic test.The F1-score is given by the harmonic mean of sensitivity and precision356
values.The False Positive Rate (FPR) is calculated as the ratio of the number357
of wrongly reported positive tests (i.e., false positives) to the total number of358
real negatives. A classifier with greater precision, accuracy, specificity ,sensi-359
tivity, NPV and F1-score is considered to be more efficient. These metrics are360
illustrated in the table 3.361
Table 3: Evaluation Metrics
Metrics Formula
Sensitivity or recall TPR = TP / (TP + FN)
Specificity SPC = TN / (FP + TN)
Precision PPV = TP / (TP + FP)
False Positive Rate FPR = FP / (FP + TN)
False Discovery Rate FDR = FP / (FP + TP)
False Negative Rate FNR = FN / (FN + TP)
Accuracy ACC = (TP + TN) / (P + N)
F1 Score F1 = 2TP / (2TP + FP + FN)
4.2.1. KNN362
Performance of the KNN is measured through two different distance mea-363
sures such as Euclidean and Manhattan and is described in the Table 4.This364
result is obtained through the value of k=7 and observed that both distance365
gives very near result for recall ie,0.7000 for Euclidean and 0.6977 for Manhat-366
tan distance. While Euclidean distance has better performance in accuracy and367
precision.368
Table 4: KNN Performance Evaluation
Accuracy Precision Recall
Euclidean distance 0.7115 0.7778 0.7000
Manhattan distance 0.6982 0.7059 0.6977
4.2.2. SVM369
Two different kernels were tried with SVM and observed that Gaussian has370
better accuracy of 0.8039 and recall of 0.8400 than RBF kernel. While RBF has371
slight improvement in precision of 0.7895 than Gaussian kernel of 0.7778.372
Table 5: SVM Performance Evaluation
Accuracy Precision Recall
Gaussian Kernel 0.8039 0.7778 0.8400
RBF Kernel 0.7867 0.7895 0.7895
15
4.2.3. Deep learning373
In deep learning we compared the performance of CNN with YOLO al-374
gorithm.The performance was evaluated through two stages such as with and375
without hyper parameter tuning and is shown in table 6. YOLO has narrow376
difference of better result in precision of 83.09% when we compared with the377
CNN of 82.77% in without hyper parameter tuning. While in case of with hyper378
parameter tuning, the CNN has remarkable results compared with YOLO. And379
this result is obtained at an epoch of 6000 and is illustrated in the figure 7.380
Table 6: Performance Evaluation
Accuracy Precision Recall
With hyper-parameter
CNN 0.8800 0.9200 0.8519
Yolo 0.8700 0.9000 0.8491
Without hyper-parameter
CNN 0.8613 0.8277 0.8224
Yolo 0.8210 0.8309 0.7734
Figure 7: Accuracy vs epochs
4.3. Overall Performance381
Table7 shows the performance of the various machine learning models under382
consideration. We have CNN and YOLO in deep learning and also SVM and383
K-NN in shallow learning.When we compare CNN, YOLO, SVM and K-NN;384
CNN is more accurate.385
4.3.1. CNN386
CNN[23] is the most popular deep learning architecture. CNN along with387
openpose library gives more accuracy within the limited time duration in the388
present study. CNN also outperformed in performance metrics. In Accuracy,389
Precision, Recall(sensitivity), F1-Score and specificity, the highest value of per-390
formance metrics are as follows 0.8800, 0.9200, 0.8519, 0.8846, 0.9130. Also, the391
FPR, FDR and FNR(0.0870, 0.0800 & 0.1481 ) were the lowest value, indicating392
16
a good result. As accuracy, precision, recall and F1-score were high, the result393
corresponding to the algorithm can be considered to be satisfactory.394
4.3.2. YOLO395
Yolo is a real time object detection system.It also comes under deep learn-396
ing.Compared to CNN its value is low but in this study, YOLO with SVM and397
K-NN showed high values. YOLO showed high values in accuracy, precision,398
recall, f1-score and specificity as 0.8700 ,0.9000 0.8491, 0.8738 & 0.8491 respec-399
tively.Here also the lowest valued components FPR, FDR and FNR have lower400
value compared to SVM and K-NN.401
4.3.3. SVM402
SVM belongs to the general category of kernel methods. SVM is a shallow403
learning method algorithm.When it is compared with deep learning algorithms404
the performance of SVM is low.But,it is better than K-NN.405
Table 7: Performance Metrics
Algorithm Accuracy Precision Recall F1-score Specificity FPR FDR FNR
CNN 0.8800 0.9200 0.8519 0.8846 0.9130 0.0870 0.0800 0.1481
Yolo 0.8700 0.9000 0.8491 0.8738 0.8491 0.1509 0.1000 0.1509
SVM 0.8039 0.7778 0.8400 0.8077 0.7692 0.2308 0.2222 0.1600
K-NN 0.7115 0.7778 0.7000 0.7368 0.7273 0.2727 0.2222 0.3000
5. Conclusions406
This paper summaries the design and development of a system to detect in-407
correct posture for detecting cervical spondylosis in initial stages. Experiments408
were performed to test the proposed system and is found to have appreciable409
accuracy. Datasets were acquired through a web camera and the necessary pre-410
processing steps are performed to enhance the quality of the same. The system411
was trained to distinguish between four classes related to cervical spondylo-412
sis using pre-recorded data. The advantage associated with the system is that413
the end users can evaluate the correctness of their posture in real time. This414
classification helps to find the risk level of CS in a tranquil manner.415
The system can be also enhanced through processing classification in a par-416
allel way. To improve the accuracy of the system, modified deep learning algo-417
rithms can be made use of. The idea of the proposed assistive system can also418
be extended to a low cost device.419
References420
[1] F. Lees, J. A. Turner, Natural history and prognosis of cervical spondylosis,421
British medical journal 2 (5373) (1963) 1607.422
17
[2] A. I. Binder, Cervical spondylosis and neck pain, Bmj 334 (7592) (2007)423
527–531.424
[3] D. Glew, I. Watt, P. Dieppe, P. Goddard, Mri of the cervical spine: rheuma-425
toid arthritis compared with cervical spondylosis, Clinical radiology 44 (2)426
(1991) 71–76.427
[4] R. A. Deyo, S. K. Mirza, B. I. Martin, Back pain prevalence and visit rates:428
estimates from us national surveys, 2002, Spine 31 (23) (2006) 2724–2727.429
[5] R. A. Deyo, S. K. Mirza, B. I. Martin, Back pain prevalence and visit rates:430
estimates from us national surveys, 2002, Spine 31 (23) (2006) 2724–2727.431
[6] L. Brain, M. Wilkinson, Cervical spondylosis and other disorders of the432
cervical spine, Butterworth-Heinemann, 2013.433
[7] C. Heller, P. Stanley, B. Lewis-Jones, R. Heller, Value of x ray examinations434
of the cervical spine., Br Med J (Clin Res Ed) 287 (6401) (1983) 1276–1278.435
[8] S. A. Olarinoye-Akorede, P. O. Ibinaiye, A. Akano, A. U. Hamidu, G. A.436
Kajogbola, et al., Magnetic resonance imaging findings in cervical spondy-437
losis and cervical spondylotic myelopathy in zaria, northern nigeria, Sub-438
Saharan African Journal of Medicine 2 (2) (2015) 74.439
[9] D. B. Nunez Jr, A. Zuluaga, D. A. Fuentes-Bernardo, L. A. Rivas, J. L.440
Becerra, Cervical spine trauma: how much more do we learn by routinely441
using helical ct?, Radiographics 16 (6) (1996) 1307–1318.442
[10] P. P. Chitte, U. M. Gokhale, Analysis of different methods for identifica-443
tion and classification of cervical spondylosis (cs): A survey, International444
Journal of Applied Engineering Research 12 (21) (2017) 11727–11737.445
[11] K. Hirano, K. Shoda, K. Kitamura, Y. Miyazaki, Y. Nishida, Method for446
behavior normalization to enable comparative understanding of interactions447
of elderly persons with consumer products using a behavior video database,448
Procedia Computer Science 160 (2019) 409–416.449
[12] M. Ariz, A. Villanueva, R. Cabeza, Robust and accurate 2d-tracking-based450
3d positioning method: Application to head pose estimation, Computer451
Vision and Image Understanding 180 (2019) 13–22.452
[13] E. Ramirez, P. Melin, G. Prado-Arechiga, Hybrid model based on neural453
networks, type-1 and type-2 fuzzy systems for 2-lead cardiac arrhythmia454
classification, Expert Systems with Applications 126 (2019) 295–307.455
[14] I. Miramontes, J. C. Guzman, P. Melin, G. Prado-Arechiga, Optimal design456
of interval type-2 fuzzy heart rate level classification systems using the bird457
swarm algorithm, Algorithms 11 (12) (2018) 206.458
18
[15] P. Melin, I. Miramontes, G. Prado-Arechiga, A hybrid model based on459
modular neural networks and fuzzy systems for classification of blood pres-460
sure and hypertension risk diagnosis, Expert Systems with Applications461
107 (2018) 146–164.462
[16] O. Castillo, P. Melin, E. Ram´ırez, J. Soria, Hybrid intelligent system for463
cardiac arrhythmia classification with fuzzy k-nearest neighbors and neural464
networks combined with a fuzzy system, Expert Systems with Applications465
39 (3) (2012) 2947–2955.466
[17] D. G. Lowe, Object recognition from local scale-invariant features, in: Pro-467
ceedings of the Seventh IEEE International Conference on Computer Vi-468
sion, Vol. 2, 1999, pp. 1150–1157 vol.2.469
[18] H. Bay, T. Tuytelaars, L. Van Gool, Surf: Speeded up robust features, in:470
A. Leonardis, H. Bischof, A. Pinz (Eds.), Computer Vision – ECCV 2006,471
Springer Berlin Heidelberg, Berlin, Heidelberg, 2006, pp. 404–417.472
[19] E. Rublee, V. Rabaud, K. Konolige, G. Bradski, Orb: An efficient alterna-473
tive to sift or surf, in: 2011 International Conference on Computer Vision,474
2011, pp. 2564–2571.475
[20] Z. Cao, G. Hidalgo, T. Simon, S.-E. Wei, Y. Sheikh, OpenPose: real-476
time multi-person 2D pose estimation using Part Affinity Fields, in: arXiv477
preprint arXiv:1812.08008, 2018.478
[21] D. Wackerly, W. Mendenhall, R. L. Scheaffer, Mathematical statistics with479
applications, Cengage Learning, 2014.480
[22] I. P˘av˘aloi, A. Ignat, Iris image classification using sift features, Procedia481
Computer Science 159 (2019) 241–250.482
[23] N. S. A. ALEnezi, A method of skin disease detection using image process-483
ing and machine learning, Procedia Computer Science 163 (2019) 85–92.484
[24] N. S. Altman, An introduction to kernel and nearest-neighbor nonpara-485
metric regression, The American Statistician 46 (3) (1992) 175–185. doi:486
10.1080/00031305.1992.10475879.487
[25] C. Cortes, V. Vapnik, Support-vector networks, Machine learning 20 (3)488
(1995) 273–297.489
[26] Y. LeCun, Y. Bengio, et al., Convolutional networks for images, speech, and490
time series, The handbook of brain theory and neural networks 3361 (10)491
(1995) 1995.492
[27] D. T. Nguyen, T. N. Nguyen, H. Kim, H.-J. Lee, A high-throughput and493
power-efficient fpga implementation of yolo cnn for object detection, IEEE494
Transactions on Very Large Scale Integration (VLSI) Systems 27 (8) (2019)495
1861–1873.496
19
[28] Y. Tian, G. Yang, Z. Wang, H. Wang, E. Li, Z. Liang, Apple detection dur-497
ing different growth stages in orchards using the improved yolo-v3 model,498
Computers and electronics in agriculture 157 (2019) 417–426.499
20
... Nonetheless, MRI is of weak correlation between the magnetic resonance findings and clinical symptoms (Yu and Xiang 2014). In addition, it is time-consuming and costly (Sreeraj et al. 2022). For these imaging methods, the accuracy mainly depends on clinician's medical knowledge and clinical experiences, which are subjective and error-prone. ...
... The reported accuracy was 79.33% when involving 87 CSM cases and 243 non-CSM cases. Sreeraj et al. (2022) designed a system to detect incorrect posture for detecting cervical spondylosis developing in initial stages. Keypoints were detected to associate body parts with human image, which can easily identify key points in hand, foot and body. ...
Article
Full-text available
Cervical spondylotic myelopathy (CSM) is the main cause of cervical spinal cord dysfunction in adults, especially in middle-aged and elderly patients, which easily leads to gait disturbance. In the present study, we propose a dynamic method for the detection of CSM based on nonlinear dynamics of gait system and deterministic learning theory. First, a 3-dimensional (3D) gait analysis system is used to capture the walking locomotion from healthy controls (HCs) and patients with CSM. Discriminant kinematic gait features, including angles of hip and knee joints in the sagittal and coronal planes, are extracted based on statistical analysis and clinicians’ empirical investigation. Second, deterministic learning theory is used to model and identify nonlinear gait system dynamics of HCs and patients with CSM, which are approximated and stored in constant Radial Basis Function (RBF) neural networks (NN). The disparity of gait system dynamics between the two groups of participants is used for classification and detection of the presence of CSM by constructing a bank of dynamic estimators with constant RBF NN. Finally, experiments are carried out on the self-constructed CSM gait database to evaluate the performance of the proposed method, in which gait data from 45 CSM patients and 45 age-matched HCs are involved. By using 2-fold and leave-one-out cross-validation styles, the achieved average classification accuracy is reported to be 94.44\(\%\) and 95.56\(\%\), respectively. The results demonstrate excellent performance and the proposed method has the potential to serve as a candidate for the automatic detection of CSM in clinical examination.
... It has been indicated that the accuracy of manual X-ray diagnosis for CS is only 68.3% [11]. Conversely, the application of artificial intelligence (AI) offers a promising avenue for the prediction and diagnosis of CS [22]. Yu [23] developed a CS classification model using fuzzy computing theory, achieving an accuracy of 80.33%, thus demonstrating the potential of machine learning in classifying and processing various imaging features effectively. ...
Article
Full-text available
The increase in Cervical Spondylosis cases and the expansion of the affected demographic to younger patients have escalated the demand for X-ray screening. Challenges include variability in imaging technology, differences in equipment specifications, and the diverse experience levels of clinicians, which collectively hinder diagnostic accuracy. In response, a deep learning approach utilizing a ResNet-34 convolutional neural network has been developed. This model, trained on a comprehensive dataset of 1235 cervical spine X-ray images representing a wide range of projection angles, aims to mitigate these issues by providing a robust tool for diagnosis. Validation of the model was performed on an independent set of 136 X-ray images, also varied in projection angles, to ensure its efficacy across diverse clinical scenarios. The model achieved a classification accuracy of 89.7%, significantly outperforming the traditional manual diagnostic approach, which has an accuracy of 68.3%. This advancement demonstrates the viability of deep learning models to not only complement but enhance the diagnostic capabilities of clinicians in identifying Cervical Spondylosis, offering a promising avenue for improving diagnostic accuracy and efficiency in clinical settings.
... [63] have shown that a random forest (RF) model does not require much effort in tuning hyper-parameters and that it performs effectively. Table 3. [75][76][77] Finally, we compared the outputs of the models with each other. We measured the prediction accuracy using the mean absolute percentage error (MAPE)(see [26,15] for further examples) and the Mean Absolute Error (MAE) (see [75] for further example) for both the base model and the social media prediction model. ...
Article
Warranty plays an important role in retaining consumers' loyalty, increasing the competitive advantage and the profit of companies. Moreover, warranty claim prediction based on social media is a novel area, enabling managers to foresee problems in production and take the proper measures to mitigate them. The higher the precision of the warranty claim predictions, the lower the risk the company faces. This paper examines the impacts of utilizing social media data on daily warranty claim prediction. In this paper, we showed that social media data could enhance the accuracy of daily warranty claim predictions. We cooperated with Sam Service Warranty Company that provides warranty and aftersales services for Samsung products in Iran. Warranty operational data along with Twitter data analyses were used to improve the precision of warranty claim prediction. Operational data from Sam Service Company include the total number of warranties, the number of warranties for new customers, and the number of warranties for those who return. A novel framework was presented that uses the Random Forrest algorithm for prediction of the number of daily warranty claims. The results show that our framework improves the accuracy of out-of-sample warranty claims predictions, with respective development at a range of 14.98% to 21.90% across various timeframes. Improving prediction accuracy enables managers to effectively minimize warranty-related costs, inventory levels, waste, and customer dissatisfaction while maximizing the return on investment, profit, efficiency, and customer satisfaction.
Article
Full-text available
In the cervical region of middle-aged and elderly patients, cervical spondylotic myelopathy (CSM) is frequently recognized as the primary factor that contributes to spinal cord dysfunction. Numbness and gait disturbance are the main clinical manifestations of CSM, which exhibits as a stiff and spastic gait in comparison with that of healthy controls (HCs). Because it is difficult to screen CSM in the primary stage which easily leading to a delay in medication, the identification of CSM followed by treatment is urgent. The aim of this study is to develop an automated classification method for the screening of CSM, using fifty-four lower extremity kinematic parameters derived from three-dimensional gait analysis. The present study employs a deep neural network (DNN) model to automatically extract informative features from raw gait kinematic data. Hierarchically placed layers in the DNN produce deep feature maps that are used to screen CSM using multiple shallow classifiers. The proposed method is evaluated using a self-constructed gait database of patients diagnosed with CSM and HCs, both groups consisting of 45 individuals within a similar age range. Experimental results reveal that the combination of deep features and shallow classifiers yields remarkable accuracy rates for binary classification with twofold, tenfold, and leave-one-out cross-validation methods, all achieving an accuracy of 99.44 %\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathrm{\%}$$\end{document}. The data suggest that our approach is efficient in detecting the early onset CSM and performs better than other cutting-edge techniques.
Article
Full-text available
Convolutional neural networks (CNNs) require numerous computations and external memory accesses. Frequent accesses to off-chip memory cause slow processing and large power dissipation. For real-time object detection with high throughput and power efficiency, this paper presents a Tera-OPS streaming hardware accelerator implementing a you-only-look-once (YOLO) CNN. The parameters of the YOLO CNN are retrained and quantized with the PASCAL VOC data set using binary weight and flexible low-bit activation. The binary weight enables storing the entire network model in block RAMs of a field-programmable gate array (FPGA) to reduce off-chip accesses aggressively and, thereby, achieve significant performance enhancement. In the proposed design, all convolutional layers are fully pipelined for enhanced hardware utilization. The input image is delivered to the accelerator line-by-line. Similarly, the output from the previous layer is transmitted to the next layer line-by-line. The intermediate data are fully reused across layers, thereby eliminating external memory accesses. The decreased dynamic random access memory (DRAM) accesses reduce DRAM power consumption. Furthermore, as the convolutional layers are fully parameterized, it is easy to scale up the network. In this streaming design, each convolution layer is mapped to a dedicated hardware block. Therefore, it outperforms the “one-size-fits-all” designs in both performance and power efficiency. This CNN implemented using VC707 FPGA achieves a throughput of 1.877 tera operations per second (TOPS) at 200 MHz with batch processing while consuming 18.29 W of on-chip power, which shows the best power efficiency compared with the previous research. As for object detection accuracy, it achieves a mean average precision (mAP) of 64.16% for the PASCAL VOC 2007 data set that is only 2.63% lower than the mAP of the same YOLO network with full precision.
Article
Full-text available
Skin diseases are more common than other diseases. Skin diseases may be caused by fungal infection, bacteria, allergy, or viruses, etc. The advancement of lasers and Photonics based medical technology has made it possible to diagnose the skin diseases much more quickly and accurately. But the cost of such diagnosis is still limited and very expensive. So, image processing techniques help to build automated screening system for dermatology at an initial stage. The extraction of features plays a key role in helping to classify skin diseases. Computer vision has a role in the detection of skin diseases in a variety of techniques. Due to deserts and hot weather, skin diseases are common in Saudi Arabia. This work contributes in the research of skin disease detection. We proposed an image processing-based method to detect skin diseases. This method takes the digital image of disease effect skin area, then use image analysis to identify the type of disease. Our proposed approach is simple, fast and does not require expensive equipment other than a camera and a computer. The approach works on the inputs of a color image. Then resize the of the image to extract features using pretrained convolutional neural network. After that classified feature using Multiclass SVM. Finally, the results are shown to the user, including the type of disease, spread, and severity. The system successfully detects 3 different types of skin diseases with an accuracy rate of 100%.
Article
Full-text available
Consumer product safety for dementia sufferers is a global problem. To develop products that can be safely used by elderly people with degradation of physical and cognitive functions (e.g., people with dementia), it is necessary to measure the product use behavior of elderly people using them in various environments and quantitatively analyze behavioral changes during product use due to changes in the functions. Recent developments in smart home technology are opening a new path for quantifying the behavior of elderly people in daily environments. This proposes a new method for comparative understanding of the elderly’s interactions with consumer products. The method employs three functions: robust pose estimation, behavior normalization method, and clustering. This paper also describes the evaluation of the developed functions and report their application to analyzing an elderly behavior library, which is an RGB-D database for elderly product use in daily environments.
Article
Full-text available
The object of interest of this paper is automatic iris classification when dealing with missing information. Our approach uses and extends a method for face recognition, based on Scale Invariant Feature Transform (SIFT). We adapted this method for iris classification and tested it on occluded iris images. We add to the keypoint matching procedure new conditions that improve the classification rate. We tested different parameters involved in the SIFT extraction process and the keypoint matching scheme on eleven image datasets with different levels of occlusion. For testing, a standardized segmented UPOL iris database was employed. We experimentally prove that the proposed approach has better results when compared with both the original method and the Daugman procedure on all datasets.
Article
Full-text available
Head pose estimation (HPE) is currently a growing research field, mainly because of the proliferation of human–computer interfaces (HCI) in the last decade. It offers a wide variety of applications, including human behavior analysis, driver assistance systems or gaze estimation systems. This article aims to contribute to the development of robust and accurate HPE methods based on 2D tracking of the face, enhancing performance of both 2D point tracking and 3D pose estimation. We start with a baseline method for pose estimation based on POSIT algorithm. A novel weighted variant of POSIT is then proposed, together with a methodology to estimate weights for the 2D–3D point correspondences. Further, outlier detection and correction methods are also proposed in order to enhance both point tracking and pose estimation. With the aim of achieving a wider impact, the problem is addressed using a global approach: all the methods proposed are generalizable to any kind of object for which an approximate 3D model is available. These methods have been evaluated for the specific task of HPE using two different head pose video databases; a recently published one that reflects the expected performance of the system in current technological conditions, and an older one that allows an extensive comparison with state-of-the-art HPE methods. Results show that the proposed enhancements improve the accuracy of both 2D facial point tracking and 3D HPE, with respect to the implemented baseline method, by over 15% in normal tracking conditions and over 30% in noisy tracking conditions. Moreover, the proposed HPE system outperforms the state of the art on the two databases.
Article
Full-text available
In this paper, the optimal designs of type-1 and interval type-2 fuzzy systems for the classification of the heart rate level are presented. The contribution of this work is a proposed approach for achieving the optimal design of interval type-2 fuzzy systems for the classification of the heart rate in patients. The fuzzy rule base was designed based on the knowledge of experts. Optimization of the membership functions of the fuzzy systems is done in order to improve the classification rate and provide a more accurate diagnosis, and for this goal the Bird Swarm Algorithm was used. Two different type-1 fuzzy systems are designed and optimized, the first one with trapezoidal membership functions and the second with Gaussian membership functions. Once the best type-1 fuzzy systems have been obtained, these are considered as a basis for designing the interval type-2 fuzzy systems, where the footprint of uncertainty was optimized to find the optimal representation of uncertainty. After performing different tests with patients and comparing the classification rate of each fuzzy system, it is concluded that fuzzy systems with Gaussian membership functions provide a better classification than those designed with trapezoidal membership functions. Additionally, tests were performed with the Crow Search Algorithm to carry out a performance comparison, with Bird Swarm Algorithm being the one with the best results.
Article
Realtime multi-person 2D pose estimation is a key component in enabling machines to have an understanding of people in images and videos. In this work, we present a realtime approach to detect the 2D pose of multiple people in an image. The proposed method uses a nonparametric representation, which we refer to as Part Affinity Fields (PAFs), to learn to associate body parts with individuals in the image. This bottom-up system achieves high accuracy and realtime performance, regardless of the number of people in the image. In previous work, PAFs and body part location estimation were refined simultaneously across training stages. We demonstrate that using a PAF-only refinement is able to achieve a substantial increase in both runtime performance and accuracy. We also present the first combined body and foot keypoint detector, based on an annotated foot dataset that we have publicly released. We show that the combined detector not only reduces the inference time compared to running them sequentially, but also maintains the accuracy of each component individually. This work has culminated in the release of OpenPose, the first open-source realtime system for multi-person 2D pose detection, including body, foot, hand, and facial keypoints.
Article
This paper describes an approach using computational intelligence methods to form a hybrid model as a classification method for 2-lead cardiac arrhythmias. The hybridization of methods can increase the performance in a system and take advantage of the benefits offered by such techniques in solving complex problems. The interpretation of electrocardiograms is a useful task for physicians, but when it comes to reviewing more than 24 h of information, it becomes a laborious task for them. For this reason, the design a computational model that helps in such a task is very useful for the timely medical diagnosis. The hybrid model is build using artificial neural networks and fuzzy logic. Training and testing of the hybrid model was with the Massachusetts Institute of Technology and Beth Israel Hospital (MIT-BIH) arrhythmia database. The heartbeats are preprocessed to improve results of classification. Ten different classes of normal and arrhythmia signals for building the hybrid model are considered. We used two electrode signals or leads included in the MIT-BIH arrhythmia database, MLII and V1, V2, or V3 as second electrode signal. The hybrid model is composed by two basic module units, as described below. A basic module unit to perform the classification for each signal lead is used. Each basic module unit is composed of three different classifiers based on the following models: fuzzy KNN algorithm, multilayer perceptron with gradient descent and momentum (MLP-GDM), and multilayer perceptron with scaled conjugate gradient backpropagation (MLP-SCG). The outputs from the classifiers are combined using a fuzzy system for integration of results. We designed two fuzzy systems, Mamdani type-1 fuzzy system (type-1 FIS) and an interval type-2 fuzzy system (IT2FIS). The reason is to perform a comparison between type-1 FIS and IT2FIS in the hybrid model. We have obtained best results in the classification rate using IT2FIS instead of type-1 FIS in the basic units. Finally, a type-1 FIS is used to determine the global classification for the 2 basic units in hybrid model. We obtained a good classification rate in each basic module unit, 92.90% and 92.70% of classification rate for basic modules unit 1 and unit 2 respectively. Finally, we obtained a 93.80% when used type-1 FIS and 94.20% of classification rate used IT2FIS combining both basic module units. In the results presented, we improve the global classification in proposed hybrid model combining neural networks and fuzzy logic used both signal lead included in MIT-BIH arrhythmia database. The proposed hybrid model maybe extended to use multi-lead arrhythmia classification using other databases that contain 12 leads to be able to make a complete medical diagnosis.
Article
Real-time detection of apples in orchards is one of the most important methods for judging growth stages of apples and estimating yield. The size, colour, cluster density, and other growth characteristics of apples change as they grow. Traditional detection methods can only detect apples during a particular growth stage, but these methods cannot be adapted to different growth stages using the same model. We propose an improved YOLO-V3 model for detecting apples during different growth stages in orchards with fluctuating illumination, complex backgrounds, overlapping apples, and branches and leaves. Images of young apples, expanding apples, and ripe apples are initially collected. These images are subsequently augmented using rotation transformation, colour balance transformation, brightness transformation, and blur processing. The augmented images are used to create training sets. The DenseNet method is used to process feature layers with low resolution in the YOLO-V3 network. This effectively enhances feature propagation, promotes feature reuse, and improves network performance. After training the model, the performance of the trained model is tested on a test dataset. The test results show that the proposed YOLOV3-dense model is superior to the original YOLO-V3 model and the Faster R-CNN with VGG16 net model, which is the state-of-art fruit detection model. The average detection time of the model is 0.304s per frame at 3000 × 3000 resolution, which can provide real-time detection of apples in orchards. Moreover, the YOLOV3-dense model can effectively provide apple detection under overlapping apples and occlusion conditions, and can be applied in the actual environment of orchards.
Article
In this paper, a hybrid model using modular neural networks and fuzzy logic was designed to provide the hypertension risk diagnosis of a person. This model considers age, risk factors and behavior of the blood pressure in a period of 24 h, using as a basis the Framingham Heart Study. Records of blood pressure are collected with the ambulatory blood pressure monitoring (ABPM), a device which takes readings for a period of time of 24 h. A modular neural network was designed, with three modules, of which the first and second modules correspond to the systolic and diastolic pressures and the last one to the heart rate. Each module is trained with the data obtained by the ABPM of different patients, this in order that the neural network learns the different behaviors that the blood pressure may have. Also, different architectures and learning methods are considered to obtain the best possible architecture. In addition, two fuzzy inference systems (FISs) for classification purpose are proposed, the first one for the heart rate level and the second one for the night profile of the patient. These were tested with different types of membership functions and then selecting the FIS that obtained the best results. Furthermore, a third FIS as a blood pressure classifier is also used. The different proposed methodologies were tested, in the case of the modular neural network to find the architecture that produces better results and in the fuzzy inference systems to find which membership functions were the ideal ones for the case study, in this way obtaining overall good results. For the case of the modular neural network, the learning accuracy in the first module is 98%, in the second module is 97.62% and the third module is 97.83% respectively. For the night profile, the fuzzy system is compared to a traditional system of production rules, and it is noted that the first one gives all correct outputs and the second one just gives 53% of the outputs, this is due to the uncertainty handling that fuzzy systems can provide, which the traditional system cannot because its rules are very strict. Hybrid intelligent systems for the solution of this kind of complex problems have excellent performance, due to the good learning in each module of the neural network and the classification uncertainty that is well managed by the fuzzy systems, obtaining with this a hybrid combination for achieving good results.