Content uploaded by Salliah Shafi
Author content
All content in this area was uploaded by Salliah Shafi on Jul 02, 2021
Content may be subject to copyright.
Predictions of Diabetes and Diet Recommendation
System for Diabetic Patients using Machine Learning
Techniques
Salliah Shafi Bhat
Gufran Ahmad Ansari
Department of Computer Applications
Department of Computer Applications
School of Computer Information and Mathematical Sciences
School of Computer Information and Mathematical Sciences
B.S. Abdur Rahman Crescent Institute of Science and Technology
Chennai 600048, India
B.S. Abdur Rahman Crescent Institute of Science and Technology
Chennai 600048, India
salliah_ca@crescent.education
gufran@crescent.education
Abstract - Diabetes is a major metabolic disease that can
seriously affect the whole human body. Nowadays, diabetes has
become a common disease to mankind from young to old. The
number of reported diabetic patients is escalating day by day,
due to innumerable reasons toxic or chemical contents mixed
with the food, obesity, working culture and bad diet plan,
unusual life style, eating food habits and environmental factors.
Hence, diagnosing of diabetes is essential to save the human lives.
Machine Learning Techniques can be used to develop an efficient
healthcare system to predict different type of diabetic diseases in
advance. In this paper a Machine Learning Techniques is used
for diagnosis of diabetes and recommended proper diet for
diabetic Patient through Diet Recommendation System (DRS).
The proper data analysis is used for the selection of proper diet
for diabetic Patients.
Index Terms - Machine Learning, Patient, Prediction, Analysis,
Diet Recommendation System
I. INTRODUCTION
Diabetes is rapidly growing nowadays in individuals,
particularly young people and become a major challenge for
the researcher, scientist and educationist [1]. The main reason
of diabetes is increase in the amount of sugar in the blood.
Diabetes can be divided into two classes. First class is known
as a type 1 and second class is diabetes type 2. The type 1
diabetes has been reported to be an autoimmune disorder,
where in the body prerequisites the cells that are involved
anthem production of insulin to consume sugar and produce
energy. This type can be controlled regardless of obesity.
Obesity is a rapid in body mass index (BMI) opposed to an
individual's typical BMI level [2]. Type 2 diabetes typically
affects obese people in middle or aged classes. This state is
characterized by a state where in, the body avoids insulin
production or fails to generate insulin. Some other causes of
diabetes are bacterial or viral infection, poisonous or chemical
food material, allergic reactions, poor health conditions,
hormonal problems feeding habits contamination of the
environment. Diabetes causes various disorders such as
cardiovascular disorder, liver failure, retinopathy and food
ulcers [3]. Diabetes can be modelled using mathematical
models. In order to develop better strategic and efficient
diabetes prediction In this paper Author uses a Machine
learning and data mining techniques.
Data analysis is a method of analysis and recognizing
hidden structures from voluminous quantity of data to extract
the information. Data analysis can be carried out in healthcare
systems to analyze medical data using Machine Learning
Techniques in order to build health care system for medical
diagnosis. Machine Learning is a type of Artificial Intelligence
(AI) that allows a system to procure responses by itself and
establish decision-making knowledge models by predicting the
unknown data. Machine learning algorithms are classified into
three main types: Semi- Supervised learning, unsupervised
learning and supervised learning. When human mind does not
exist, humans are unable to explain their understanding (speech
recognition) then the supervised learning algorithms is used.
The algorithms of supervised learning are categorized into
various groups, such as probability-based function-based, rule-
based, tree-based, etc. changes to the solution in time series
(computer process routing) and to the solution will be modified
based on the individual situations (user biometrics). Predictive
modelling is used in supervised learning which considers both
input and output data for training and execution processes
while the unsupervised learning acts as a cluster which
interprets data based only on the output
The unpredictable learning is the learning of the descriptive
form. The material is represented or outlined using this
instruction. Clustering, relation law mining and so on are
examples of unsupervised learning algorithms. The balance of
supervised and unsupervised is semi-supervised learning. In
this paper author proposed a health care system for predicting
and recommended a diet for the diabetes patients. In addition,
the supervised learning algorithm is used to learn information
about diabetes and to develop a diabetes prediction system for
diabetes diagnosis. Also is used pre-processing data set, feature
selection with Machine learning by using Age, Diagnosis
Duration, Diastolic Blood Pressure, Cholesterol level and
Hemoglobin.
The rest of the paper as follows: Section II is as related
works. Section III presents the feature selection. Section IV
discusses about the machine learning model for diabetic
2021 2nd International Conference for Emerging Technology (INCET)
Belgaum, India. May 21-23, 2021
978-1-7281-7029-9/21/$31.00 ©2021 IEEE
1
prediction. Section V is talks about result and discussion and
finally section VI ends with conclusion future work.
II. RELATED WORK
Khalil et al. evaluate the data mining techniques used in
medical data analysis, which are particularly used by
classification and regression tree (CART) algorithm to
diagnose locally recurrent diseases [3]. Zhao et al.
implemented a predictive method for subcutaneous glucose
concentration. The ideal proposed will forecast type 1 diabetes
mellitus [4]. Machine Learning is a well-established research
field in the domain of computer research that plays a vital role
in the growth of classification and predictive analysis system
[5]. Reducing the overall number of characteristics lowers the
average time taken for choosing applications [6]. The distance
between classifications is Euclidean, solution vectors are used
to improve the similarity measure, to enhance the reliability of
classification for the development of smart and intelligent
computing systems, and data mining algorithms are utilized to
achieve the desired conclusions [7, 8]. Some of the well-known
Machine learning algorithms are decision trees, random forest
algorithm, predictive, the data mining methods of statistical
data mining [9]. A supervised algorithm and focuses on
triggering models [10]. The models can be used for
classification and prediction [11]. The model is proposed to
address the problems in the specific measures of success that
exists in the predictive model [12]. It is possible to enforce a
local average classification based to solve the difficulties
arising in wide areas where the data bases are incomplete [13].
The National Statistics Article on Diabetes is a quarterly
publication of the diseases control and prevention center’s
(CDC) that gives beneficial change on diabetes in Patients with
diabetes for professionals audiences [14]. This requires details
on diabetes prevalence and occurrence, pre-diabetes, Acute and
long term complications, risk factors, death and investments
[15]. An approximate of 33.9% in the U.S, aged people had
pre-diabetes(84.1) million people cantered on their level of
fasting glucose, Pre-diabetes was observed in almost half (48.3
percent) of adults aged 65 years or over. Diabetes is considered
as a major cause of death in USA[16]. Area of secured wireless
body network facilitates the creation of a predictive analysis
framework in healthcare [17]. With the increase in social
networking sites the Internet of Things and other data sources
that manage the immense amount of data remain a tough
assignment. It is obvious from literature that an intellectual ,
effective and efficient cloud based cluster model can manage
the immense amount of data [18]. Neural Networks can be
used to manage for mining the secrets, genetic algorithm works
better with patterns [19]. Instead of the abundance of massive
amounts of distributed computing based data from various data
sources, the cluster, along with the cloud framework, supports
powerful data processing. Diwani et al. reviewed the
applications of data mining in health care technology using
data mining techniques. Data mining is techniques to discover
information in databases (KDD) and to visualize data. In
addition, text diagnostic evidence and optical visual images
like X-rays and Magnetic Resonance Imaging (MRI) are used
for the treatment of illness detection. Darcy A. Davis suggested
data mining as a tool used to discover information in databases
(KDD) and to visualize data. In addition, text diagnostic
evidence and optical visual images such as X-rays and
magnetic resonance imaging (MRI) are used for the treatment
of illness detection. Ansari G. A. propose a model of Adoptive
Medical Diagnosis System (AMDS) using Expert System (ES)
and explain in a very simple and clear way that how model is
helpful for the patients that infected with common diseases[20,
21].
III. FEATURE SELECTION
To increase the performance of algorithm classification
collection of feature is used of meaningless and redundant data
such as ability. In the care of diabetes attributes are usually
collected but only one a tiny number are used i.e. the clinicians
are regularly practicing of special feature selection. Being an
issue in the modern world a large number of noisy and
irrelevant redundant, features are included in the data and for
the blood glucose control prediction data collected is not
satisfied the requirement and a lot of irrelevant information was
collected with the data which is shown the Table: 1. The data
set structure which has 11 attributes is shown in Table:1 and
the attributes name are Age, Diagnosis Duration, Diastolic
Blood Pressure, Cholesterol level, Hemoglobin, Plasma
Glucose concentration, Body Mass Index(BMI),Triceps
Skinfold thickness, Diabetes Pedigree Function, Serum Insulin,
Diabetes Diagnose results(“tested positive, tested
negative”).Plasma Glucose Concentration it is used in an oral
Glucose tolerance, Triceps skin fold thickness is a value used
for body fat measurement ,calculated on the right arm halfway
between the elbow olecran on process the scapula process
while as serum insulin it is the harmone that helps you to move
the sugar known as glucose etc. By using these parameters we
take some of the attributes from the Table: 1. that will identify
the Diagnosis Duration and in future we planned to focus the
diabetic patients based on (Age, Cholesterol level, Diagnosis
Duration, Hemoglobin, Diastolic Blood pressure ) which
provides and summaries of Mean, standard deviation of some
attributes as shown in Table: 2
TABLE I. DIABETIC PATIENTS HAVING UNIQUE ATTRIBUTES
2
TABLE II. DIABETIC PATIENTS WITH GOOD AND BAD
GLUCOSE LEVELS
Patient Body Parameters
Good Control
Bad Control
Mean
SD
Mean
SD
Age
62
9
66.84
9.23
Diagnosis Duration (Years)
5.6
9.73
9.22
6.55
Hemoglobin (HBA1C)
7.2
0.51
9.33
2.43
Diastolic Blood Pressure
76.55
13.7
146.3
21.7
Cholesterol Levels
4.03
1.17
4.16
1.20
In Table 2: Researcher select the important attributes from
the Table1: like (Age, Diagnosis Duration, Diastolic Blood
Pressure, Cholesterol level, Hemoglobin) by using these
parameters in Table 2, and provides the summaries of mean
and standard deviation of selected attributes.
Fig. 1. Comparison of attribute: a). the result obtained with good control b)
the result obtained with the bad control
In the Fig.1: Researcher compares the attributes with good
control and result is obtained and the when the result is
obtained in the bad control it explains the distribution of mean
glucose level on the basis of Age, Diagnosis Duration,
Hemoglobin, Diabetic Blood pressure, Cholesterol level by
using the mean standard deviation during this bionic period all
the Patients have a Glucose level in a good control and then
the result obtained in second time the mean and standard
deviation is bad in control of sugar level.
IV. MACHINE LEARNING FOR DIABETIC PREDICTION
The given model below Fig. 2: provide the complete
prediction process for diabetic patients. It also shows the
Machine Learning Techniques of diabetic prediction for the
patients Hospital Management system. The model has two
main components: Hospital Management System and patients
Database.
Fig. 2. Machine Learning Model For Diabetic Patients
In Fig. 2: Patient come to Hospital and register all the
information like Patients Body Parameter, Glucose level,
Cholesterol level, Patients data base, Patients attributes i.e.,
Type1 Diabetes and Type 2 Diabetes etc. After that we select
the attributes and applying Clustering, Machine Learning
Techniques. If result comes negative we store the data device
and inform the patient you don’t have diabetes and Patient
don’t need any diabetic treatment. The mobile devices store our
result and provide an update of Patients activity like food type,
medication, and treatment updates, Control of events, CGMS.
Any emergency related to update or feedback is also provided
through a smart phone and acted accordingly. Initially, in the
data, the diabetes dataset is generated. Then if our result comes
positive we go for lab units and recommended diet and
treatment for patients. In addition, diabetes is projected using
the Learning model for a person's medical record or results.
A. Activityt Chart of Diabeties Prediction and Diet Plan
Fig. 3. Activity Chart Of Diabeties Prediction and Diet Plan
3
Patient come to Hospital and register his\her information
into Hospital Management System(HSM)And the information
is store in patients database. From the patients database we
select the Patients body attributes and after that we apply the
feature selection techniques on the Patients body attributes by
applying Machine Learning Techniques to analyse the result
whether the patient is diabetic or not .If the result comes
Negative then no need to inform the patient .If it comes
positive that is to be checked what type of diabetes the patient
have based on that we will prescribe and Recommend the
patient and the patient has to follow the diet plan otherwise
he\she has to stop.
V. RESULT & DISCUSSION
It illustrates the accuracy of the diabetes dataset Machine
Learning algorithms like (Decision tree, Random Forest and
Navies base) with respect to multiple research methods
(precision, recall, F_measure, and accuracy) with pre-
processing method (WPP) and without pre-processing method
(WOPP) as shown in Table3. Table3 also displays the
performance of diabetes dataset Machine Learning algorithms
(NB, ML, and PRF) with respect to various research methods
(FCV, PS, UTD) with and without pre-processing method
(WPP) and without pre-processing method (WOPP). It is
reported that the PS research system achieves greater precision
relative to other techniques without a pre-processing method
for the NB Machine Learning algorithm. In comparison, the
pre-processing approach boosts the performance of the NB
Machine Learning algorithm. In addition the pre-processing
approach improves the performance of the NB Machine
Learning algorithm's performance. In comparison, with the
exception of the UTD test step, the pre-processing method
improves the precision of the MLP Machine Learning
algorithm. The UTD test system provides greater accuracy for
the RF Machine learning algorithm relative to other approaches
without pre-processing method. Moreover, with the exception
of the FCV evaluation approach, the pre-processing method
improves the precision of the RF Machine Learning algorithm.
Fig. 4: shows comparative analysis of Machine Learning
techniques. The pre-processing methodology generates greater
average precision for NB.
TABLE III. STATISTICAL COMPARISION OF MACHINE
LEARNING TECHNIQUES
Method
Decision Tree
Naive Bayes
Random forest
Precision (%)
86
90
93
Recall (%)
76
81
87
F_Measure (%)
81
85
92
Accuracy (%)
87
90
93
Fig. 4. COMPARATIVE ANALYSIS OF MACHINE LEARNING
TECHNIQUES
VI. CONCLUSIONS & FUTURE WORK
For diabetes awareness, this paper suggested a diabetes
prediction technique. In order to construct the Machine
learning model for the detection of diabetes, various Machine
learning algorithms are used, Namely probabilistic-based naive
Bayes (NB), function-based multilayer perception (MLP),
decision tree-based Random Forests (RF) have been used. In
addition, the Machine Learning model is evaluated with
various test methods, such as 10-fold cross validation (FCV),
66 percent (PS) percentage break, and the use of training
dataset (UTD) to assess the accuracy of the Machine Learning
model’s results. In order to improve the model’s precision the
pre-processing approach is used. The pre-processing approach
is adopted to improve the accuracy of the model. The
processing technique of the Machine learning algorithm is in
two situations. Compared to other Machine learning techniques
and the pre-processing approach, generates greater overall
precision for Navies Bayes method. In our future the
intensification is to be done with this work for better the
performance. Feature extraction and selection is one of the
important key factors for the classifications in the future work
we need to look over the feature extraction and selection for
superior classification. We planned to focus on diabetes
patients based on cholesterol level, Blood pressure and
Hemoglobin. It can be perceived that the disease has several
other unidentified causes.
REFERENCES
[1] Berger, Ashton C., et al. "A comprehensive pan-cancer molecular study
of gynecologic and breast cancers." Cancer cell 33.4 (2018): 690-705.
[2] Dean, Laura, and Jo McEntyre. "Introduction to Diabetes" The Genetic
Landscape of Diabetes [Internet].National Centre for Biotechnology
Information (US), 2004
[3] Khaleel, Mohammed Abdul, Sateesh Kumar Pradham, and G. N. Dash.
"A survey of data mining techniques on medical data for finding locally
frequent diseases" International Journal of Advanced Research in
Computer Science and Software Engineering 3.8 (2013): 149-153.
[4] Zhao, Chunhui, and Chengxia Yu."Rapid model identification for online
subcutaneous glucose concentration prediction for new subjects with
type I diabetes." IEEE Transactions on Biomedical Engineering 62.5
(2015): 1333-1344.
4
[5] Sun, Liang-Dan, et al. "Genome-wide association study identifies two
new susceptibility loci for atopic dermatitis in the Chinese Han
population" Nature genetics 43.7 (2011): 690-694.
[6] Bakshi, Sambit, et al. "Fast periocular authentication in handheld devices
with reduced phase intensive local pattern." Multimedia Tools and
Applications 77.14 (2018): 17595-17623.
[7] Yi, Hong, et al. "Recent advances in radical C–H activation/radical
cross-coupling." Chemical reviews 117.13 (2017): 9016-9085.
[8] Yu, Xue-Jie, et al. "Fever with thrombocytopenia associated with a
novel bunyavirus in China." New England Journal of Medicine 364.16
(2011): 1523-1532.
[9] Yuvaraj, N., and A. Sabari. "Twitter sentiment classification using
binary shuffled frog algorithm." Intelligent Automation & Soft
Computing 23.2 (2017): 373-381.
[10] Chapelle, Olivier, VikasSindhwani, and Sathiya S. Keerthi.
"Optimization techniques for semi-supervised support vector machines."
Journal of Machine Learning Research 9.Feb (2008): 203-233.
[11] Liu, Daqing, et al. "Learning to assemble neural module tree networks
for visual grounding." Proceedings of the IEEE International Conference
on Computer Vision, 2019.
[12] Kaveeshwar, Seema Abhijeet, and Jon Cornwall. "The current state of
diabetes mellitus in India" The Australasian medical journal 7.1 (2014):
45.
[13] Deputy, Nicholas P., et al. "Prevalence and changes in preexisting
diabetes and gestational diabetes among women who had a live birth—
United States, 2012–2016." Morbidity and Mortality Weekly Report
67.43 (2018): 1201.
[14] https://www.cdc.gov/features/diabetes-statistic-
report/index.html14.11.2020
[15] https://www.cdc.gov/diabetes/pdfs/data/statistics/national-diabetes-
statistics-report.pdf 14.11.2020
[16] Kumar, R. Praveen, and S. Smys. "A novel report on architecture,
protocols and applications in Internet of Things (IoT)"2018 2nd
International Conference on Inventive Systems and control (ICISC),
IEEE, 2018
[17] Zhang, Peng, et al. "Pattern mining model based on improved neural
network and modified genetic algorithm for cloud mobile networks."
Cluster Computing 22.4 (2019): 9651-9660.
[18] Diwani, Salim, et al. "Overview applications of data mining in health
care: the case study of Arusha region." International journal of
computational engineering research 3.8 (2013): 73-77.
[19] Davis, Darcy A., et al. "Predicting individual disease risk based on
medical history" Proceedings of the 17th ACM conference on
Information and knowledge management 2008.
[20] Ansari, G.A., “An Adoptive Medical Diagnosis System Using Expert
System with application “Journal of Emerging Trends in Computing and
Information Sciences, Vol. No. 4 March 2013.
[21] Ali Khalifah and Ansari, G.A., “Modeling of E-procurement System
through UML using Data Mining Technique for Supplier performance”
IEEE International conference on software networking(ICSN) May 23-
26, in Jiju-Islan, Sauth korea 2016.
5