ArticlePDF Available

Predicting University's Students Performance Based on Machine Learning Techniques

Authors:

Abstract

Machine learning algorithms have been used in many fields, like economics, medicine, etc. Education data mining is one of the areas concerned with exploring patterns of data in an educational environment. One of the most important uses is to predict students' performance to improve the existing educational situation. It can be considered as one of the data mining sciences. The ability to predict in advance in many areas has many benefits. In the case of learning, it enables us to know students' levels in advance and identify students who need special attention. This paper proposes using the algorithm (GBDT) which is a machine learning technology used for regression, classification, and ranking tasks, and is part of the Boosting method family to predict university students' performance in final exams. It compares the proposed system's performance with selected machine learning algorithms (Support vector machine, Logistic Regression, Naive Bayes, Gradient Boosted Trees).
Predicting University's Students Performance
Based on Machine Learning Techniques
Dindar Mikaeel Ahmed
Information technology Dept.
Duhok polytechnic University
Duhok, Iriq
Dindar.ahmed@dpu.edu.krd
Adnan Mohsin Abdulazeez
Research Center of Duhok
Polytechnic University
Duhok polytechnic University
Duhok,Iriq
adnan.mohsin@dpu.edu.kd
Diyar Qader Zeebaree
Research Center of Duhok
Polytechnic University
Duhok, Kurdistan Region, Iraq
Dqszeebaree@dpu.edu.krd
Falah Y. H. Ahmed
Faculty of Information Sciences
& Engineering, Management &
Science University, Shah Alam,
Selangor, Malaysia
falah_ahmed@msu.edu.my
AbstractMachine learning algorithms have been used in
many fields, like economics, medicine, etc. Education data mining
is one of the areas concerned with exploring patterns of data in an
educational environment. One of the most important uses is to
predict students' performance to improve the existing educational
situation. It can be considered as one of the data mining sciences.
The ability to predict in advance in many areas has many benefits.
In the case of learning, it enables us to know students' levels in
advance and identify students who need special attention. This
paper proposes using the algorithm (GBDT) which is a machine
learning technology used for regression, classification, and
ranking tasks, and is part of the Boosting method family to predict
university students' performance in final exams. It compares the
proposed system's performance with selected machine learning
algorithms (Support vector machine, Logistic Regression, Naive
Bayes, Gradient Boosted Trees).
KeywordsMachine Learning, GBDT, Student Performance,
EDM.
I. INTRODUCTION
Recently, machine learning algorithms have played an
important role in most field of science and life. These
algorithms prove their efficiency and ability to be used for
classification and predication with acceptable accuracy. It is
one of the fields concerned with exploring patterns from the
educational process data[1-3]. One of the most important uses
of EDU is to predict students' performance to reach
improvements to the existing educational situation, and it can
be considered one of the sciences of data mining[4, 5]. Students'
data is usually taken from student records, cooperative or
interactive learning environments, or data recorded with school
and university administrations[6, 7]. The EDM aims to predict
the future behaviors of students. It finds educational motives
and discovers the extent of the learner's experience and knows
its interaction with the educational strategy used. Furthermore,
EDM improves and discovers the related models: It is possible
to find and enhance new models in the optimal educational
process to support learning styles [8-10]. Also, the effect of
educational supports can be realized: through which new
educational systems can be achieved. Finally, the EDM
develops the knowledge about learning and learners through the
(EDU) model can be constructed that may be useful in
educational technology[11-13]. Much research indicates that
students' early predictability is necessary for the educational
system[14-16]. Predicting students' academic performance is
required in several cases, including using specific educational
methods for groups of students or those who deserve
scholarships[17, 18]. Predicting student performance enables
educational institutions to use appropriate strategies to improve
student performance prediction[19-21]. Many studies about the
use of machine learning algorithms for student performance in
the education data mining field are an essential data mining
sector and the crossroads of statistics, information science and
education[22-25]. The use of machine learning techniques
depends mainly on the feature taken to predict something[26].
This paper presents suggested features and Logistic Regression,
Naive Bayes, SVM, Gradient Boosting DT was used in the
development of a machine learning model. and adopt the most
common accuracy measures comparing the efficiency of the
models are used. The information was analyzed at the
university of education from undergraduate students to create a
dataset, and preprocessing was used to replace the missing data,
identify important features and analyze the results to find the
best model and features that affect student's performance,
which in turn determines the importance of each feature for
student's performance[54-55].
II. LITERATURE REVIEW
The use of machine learning in educational systems is a field of
research and application that includes several areas such as
predicting students' performance, extracting data for decision-
making and exploring relationships[27]. koutina et al,[28]the
students' performance to assist the learners were predicted using
machine learning algorithms. The study was applied to 117
master's students, and NB and Compared to established
classification algorithms, ANN obtained the best results.
Ventura et al,[29]used many learning algorithms estimate the
student dropouts and were applied to 419 high school students
using the proposed ICRM2 algorithm. Raza et al,[19]early
predictions of the institute's students were used (video learning
analysis) for a data set consisting of 772 students and relying on
the student's academic information, effectiveness, and video
interaction. By identifying certain features, that random forest
was better by determining students' performance compared to
the rest of the algorithms.
In the study V.uday et al,[30]did that by applying a system of
statement of university students performing utilizing data
collected machine learning methods, including 200 students, the
study concluded that the possibility of predicting the pass or fall
of the student in the exam and the highest accuracy was the
2021 IEEE International Conference on Automatic Control and Intelligent Systems (I2CACIS 2021), 26 June 2021, Shah Alam, Malaysia.
978-1-6654-0343-6/21/$31.00 ©2021 IEEE
276
2021 IEEE International Conference on Automatic Control & Intelligent Systems (I2CACIS) | 978-1-6654-0343-6/21/$31.00 ©2021 IEEE | DOI: 10.1109/I2CACIS52118.2021.9495862
Authorized licensed use limited to: UNIVERSITY TEKNOLOGI MALAYSIA. Downloaded on July 31,2021 at 11:16:06 UTC from IEEE Xplore. Restrictions apply.
decision tree algorithm. Dursun,,[2]a study to predict the causes
of the decline in the number of new students using educational
institutions' data was conducted. A model was created to reveal
the variables that cause the phenomenon using artificial
intelligence. The study concluded the most critical educational
and financial factors elements of decreasing student numbers.
And SVM has achieved the best accuracy. Fazal et al,[31]using
decision trees to predict students 'and parents' priorities if they
were to go ahead or leave the educational program, the system
performed with an accuracy of 83.48 on a real data set of 1020
students for this purpose. Filipe et al,[18]to predict students'
final grades, the author used a set of data collected via the
Internet for 460 students and using the ML pipelines. The system
showed the ability to predict their performance in the exam by
up to 75.55%. Ammar et al,[32]using EMT to predict students'
performance showed its superiority compared to the most
commonly used machine learning algorithms for the same
purpose, with an accuracy of 98.5. Hamza,[20] within the
concept of EMD, a system was created to predict students'
grades and compare machine learning algorithms and
suggesting improvement by using BGA which are applied to
(CNN, NB, KNN, C4.5). The results showed that all classifiers
improved their accuracy by 23% except for NB. Sumyea et
al,[33]data was collected from an Australian university to
establish an effective student support system. Intelligent (Tree-
Based, Rool based) methods were used for this purpose. The
models showed that they are more useful for designing the
system. Alaa et al,[34]the author tested the ML algorithms (REP
Tree, J48, and Random Tree) to predict students ’performance
on 60 questions covering important areas. The results showed
that the J48 algorithm is better in performance. Huan et
al,[35]the idea was to propose a new method for predicting
student performance in online interactive questions. The paper
introduced new features (e.g., first attempt, first drag and drop,
reflection time) based on the mouse's movement to determine
the details of solving students' problems [56-57]. The result
shows that using statistical features and artificial intelligence can
achieve better results than traditional methods. M Krishna et
al,[36]the classification and Regression Decision Tree (CART)
algorithm has been used to classify students and predict at-risk
based on four activities: creating group wiki content, exchanging
messages, opening course files and testing online. The results
demonstrated the possibility of identifying students at risk of
failure. Hussein et al,[37]machine learning methods were used
to develop a classifier capable of forecasting student success.
Machine learning algorithms include (ANN), (DT) and Naïve
Bayes, and Logistic Regression. The forms created according to
a review submitted to students and the student grades textbook.
The ANN (fully connected multi-layered forward feed) The best
model was made. Lubna,[38]the authors explored the possibility
of identifying key indicators in a small dataset. The best
indicators are included in the machine learning algorithms, and
SVM was according to the most accurate model. Alaa et al,[39]
in a study to predict student performance using ANN, features
were extracted to identify the important questions. The
researchers were able to identify the 30 most important
questions through four methods: PCA, SVM, GAIN, correlation
out of 161 questions. Fergie et al,[40] comparing two algorithms
(random forest and decision tree) to predict computer science
students' performance using information about GPA, family
background, and demographics found that random forest is
better. Maria et al,[41]to identify students at risk of failure and
to improve academic performance, an intelligent system was
designed for this purpose. Deep Network is a model selected for
learning, and it shows better Accuracy.
III. METHODOLOGY
Proposed study, a set of machine learning algorithms is utilized
to predict university students' performance in final exams and
compare the algorithms' performance with the proposed
algorithm (GBDT) for this purpose. The proposed method
includes several stages: The first stage includes collecting data
and creating a dataset for students, and determining the
important features. The second stage includes preprocessing for
missing data imputation and nose data handling. The third stage
includes training the classification models and comparing the
performance of those models. Weka version 3.8.5 platform used
to implement the selected models.
Fig. 1. Proposed Framework
A. Dataset
The data was collected through student records at the university,
and the sample included 450 students for the four stages of the
bachelor’s study. It included 20 Attributes, including students
’performance in the final exam, and the Attribute is shown in
“Table I”.
TABLE I. STUDENT DATASET ATTRIBUTES
Attribute
Name
Data type
Category
No
numerical
Range 0 450
Collect data from students and create
a dataset
Pre-processing
Algorithms
Evaluate the results
Testing
Train dataset
Test dataset
2021 IEEE International Conference on Automatic Control and Intelligent Systems (I2CACIS 2021), 26 June 2021, Shah Alam, Malaysia.
978-1-6654-0343-6/21/$31.00 ©2021 IEEE
277
Authorized licensed use limited to: UNIVERSITY TEKNOLOGI MALAYSIA. Downloaded on July 31,2021 at 11:16:06 UTC from IEEE Xplore. Restrictions apply.
Attribute
Name
Data type
Category
Site no
numerical
Range 0 -1250
Age
numerical
Range 18 28
College
Nominal
College of Education
Department
Nominal
Computer science,
mathematics, English
language
Stage
Nominal
First, second, third, fourth
Sex
Nominal
M, F
home address
Nominal
City names
Distance from
university to
home
Nominal
near, medium, far
number of
family members
numerical
Range 1 - 100
father's job
Nominal
Governmental, private sector
Mother's job
Nominal
Governmental, private sector
Father's
scholastic achie
vement
Nominal
Uneducated, primary
education, intermediate
education, BCH, diploma,
MSc, PHD
Mother's
scholastic achie
vement
Nominal
Uneducated, primary
education, intermediate
education, BCH, diploma,
MSc, PHD
number of
Repetition years
numerical
Range 0 – 5
Internet usage
rate
numerical
Range 0 10
Extra work
Nominal
Yes, no
Mid School
degree
numerical
Range 50 99
fatherless
Nominal
Yes, no
Choose of
college
Nominal
my choice, By degree
label
Nominal
P, F
B. Proposed Algorithm
Three machine learning algorithms were compared to the
proposed model in this paper as follows:
Logistic Regression: is a statistical technique used to
describe the relationship between a binary dependent
variable and one or more independent variables
according to the following formula[42, 43]:
!" #"$%&'
(
!!
"#!!
)
#"*$+"*","+""*%,%++""*&,&
"" (1)
Where
!
: log-odds,
'
:is the base of the logarithm,
*&
: are
parameters of the model,
''
: is the probability of the event.
In machine learning, a dependent variable is known as a
target variable. Also, independent variables are known as
predictor variables or features.
Naive Bayes: is a probability-based machine learning
method invented by Thomas Bayes. It is used to classify
data. This method assumes that the properties used to
build the model are separate from some, so changing
one property's values does not affect another
property[44]. It is one of the high-speed algorithms
when performing classification, and therefore it is used
in the real-time classification of data. In general, the
naïve Bayes works better with categorical data than
numerical data according to the following formula[45].
'
-
.(/,
0
#"!
)
*"
+
,!)-.*"+
!)-+
!
(2)
Where:
'-.(0
: prior probability of the output,
'
-
,
0 :
probability of the predictor.
1-2/3/0
: conditional
probability,
1-3//20
:posterior probability.
SVM: Is a method of active learning under machine
learning, and it is employed in both classifications or
regression tasks. However, the supporting vectors are
used in classification, so this will be what we focus on
in this article[46]. The SVM is built on the concept of
constructing a hyperplane that optimally divides the
data set into two groups[47, 48].
Gradient Boosting Decision Tree: This is a method used
in machine learning that use the concept of (Gradient
Boosting) to solve regression and classification
problems, whose idea is based on building an improved
predictive model through a set of weak predictive
models[49] . GBDT uses decision trees to build a model,
and the error rate is calculated in the first model and
builds a new model [59,60]. Depending on improving
or reducing the error in the first model, this process
continues in building models until the lowest error rate
is reached[50].One of the most powerful methods of
data classification is Gradient Boosting Decision Tree
(GBDT). With this algorithm, the key principle is that
each decision tree is constructed first on the path of
gradient descent, a loss function, relying on the residual
of previous models. The loss function represents the
trained models' output, the higher this value, the worse
the model. One method of choosing loss function is to
allow the loss function's value to decrease in the
direction of the value function [58]. In this case, a
distinct failure function can be chosen, such as the. The
least squares loss function, the least absolute variance
loss function, and the Huber loss function are all
examples of loss functions. GB is capable of solving
regression and conditional classification
problems[51].The gradient boosting algorithm is
presented below as in Algorithm,
1.
𝐹!
(
𝑥
)
= && 𝑎𝑟𝑔𝑚𝑖𝑛"
𝑙(𝑦𝑖 𝑦)
#
$%&
2.
𝒇𝒐𝒓&𝑚 = 1&𝑡𝑜&𝑀&𝒅𝒐
3.
𝑦𝚤
:
& = −['(
)
"!"*+
)
,#
-
.
-
'+
)
,#
-
]+
)
,
-
%𝐹/0&(𝑥)&
i = 1, N
4.build the
𝑚12
prediction model
ℎ(𝑥, 𝑎/
5.
𝑎/=𝑎𝑟𝑔𝑚𝑖𝑛3*4 &
[&𝑦5&
?
&&& 𝜁ℎ(𝑥$; 𝑎/)]6&
#
$%&
6.
𝑦/
=
𝑎𝑟𝑔𝑚𝑖𝑛3*4 &&
𝐿(
#
$%& 𝑦$..*𝐹/0&&
(
𝑥$
)
+ 𝑦2(𝑥 ; 𝑎/.)&)
7.
𝐹
/.
(
𝑥
)
= 𝐹/0&
(
𝑥
)
+𝑝𝑟&𝛾𝑚
)
,73$
-0 < pr ≤ 1
2021 IEEE International Conference on Automatic Control and Intelligent Systems (I2CACIS 2021), 26 June 2021, Shah Alam, Malaysia.
978-1-6654-0343-6/21/$31.00 ©2021 IEEE
278
Authorized licensed use limited to: UNIVERSITY TEKNOLOGI MALAYSIA. Downloaded on July 31,2021 at 11:16:06 UTC from IEEE Xplore. Restrictions apply.
8.end for
In the beginning, the algorithm initializes the model using a
simple process, where for
4$
-
,
0
"
, The mean of all instances
class labels. After that, throughout M round, it trains M models
in total. The parameter M is considered a hyperparameter of the
algorithm; for example, increasing M will decrease the training
set's variance. When set too high, it can lead to overfitting. For
any model m, it computes the pseudo-residuals
"50"
6, i.e., and loss
function for negative direction gradient value -7
8-91":90
2
13"
0
according to the
;< =
model, then constructs the
;>?
prediction model
@-,AB40
. Obtaining parameters is next step
(
B4
) by fitting it to the pseudo-residuals, using the least square
method to make sure the new model,
@-,1AB40
, in the gradient
direction can achieve a minimum value. Next, it optimizes the
one-dimensional least squares problem in. The final GB decision
will be arrived at with the decision model
@-,AB40
. Thus, its
update with
-44-,00
. A hyperparameter pr (learning rate) in
update rule in line 7 regulates the regularization by shrinking of
graph GB[52].
IV. RESULT AND DISCUSSION
The performance of machine learning algorithms was
compared using specific algorithms to compare their
performance with the proposed algorithm. The results were as
shown in the following “Table II”.
TABLE II. PERFORMANCE COMPARISON OF MACHINE
LEARNING ALGORITHMS
No
Model
F
Measur
e
Precisi
on
AUC
Recall
Accuracy
1
Logistic
Regressi
on
0.902
0.907
0.1
0.898
82.6%
2
Naive
Bayes
0.924
0.945
0.9
0.945
86.2%
3
Support
vector
machine
0.941
0.889
0.6
1.000
88.8%
4
Gradient
Boosted
Trees
0.942
0.896
0.9
0.993
89.1%
Fig. 2. Performance Comparison of Machine Learning Algorithms
As shown in table II, this study compares the performance of the
most popular machine learning algorithms used for binary
classification. GBDT achieved the highest performance of the
measures with a 0.891% accuracy and a 0.942 F-Measure, with
superior results than the rest of the selected algorithms. At the
same time, SVM performed better concerning the RECALL
measure 1.000, and the logistic regressions algorithm achieved
0.907 on the Precision measure. The GBDT algorithm is a
promising method in order to forecast student success based on
these findings. The acceptable accuracy of various machine
learning algorithms indicates the possibility of predicting
students' performance based on the proposed dataset, and that
the categorical classifiers perform well for this field and ROC
performed as the following figures:
Roc of Naive Bayes
Model.
Roc of Logistic Regression
Model
Roc of Gradient
Boosted Trees Model.
Roc of Support vector machine
Model.
Fig. 3. Roc performance
The GBDT algorithm has many advantages that make it suitable
for predicting students' performance as it has achieved the
highest performance in terms of accuracy and overcomes the
problem of overfitting.
TABLE III. COMPARISON BETWEEN OUR PREDICTING AND
SOME OTHER RELATED WORKS
Dataset
Paper
Method
Accuracy
real-world dataset of
1,021 records,
collected from
examination database
of the University of
Peshawar
Fazal et al,[31]
decision
tree
83.48%
online judge
Filipe et al,[18]
RF
77.29%
Dataset collected from
College of Computer
Science and
Information
Technology of Bsc
Ali S. H. et al,[53]
logistic
regression
68.7% for
passed
and
88.8%
for failed
Dataset Collect
fromC.S. Dept./
Universitas Klabat
collected
Fergie et al,[40]
Decision
Tree
66.9%.
Collect from university
students
This Paper
GBDT
89.1
0
0.5
1
1.5
F
Measure
Precision Recall ROC Area Accuracy
Logistic Regression Naive Bayes
Support vector machine Gradient Boosted Trees
2021 IEEE International Conference on Automatic Control and Intelligent Systems (I2CACIS 2021), 26 June 2021, Shah Alam, Malaysia.
978-1-6654-0343-6/21/$31.00 ©2021 IEEE
279
Authorized licensed use limited to: UNIVERSITY TEKNOLOGI MALAYSIA. Downloaded on July 31,2021 at 11:16:06 UTC from IEEE Xplore. Restrictions apply.
V. CONCLUSION
Predicting student’s performances is one of the important
fields (EDM). In this paper, a dataset was created to predict
students' performance in final exams at an earlier date. The
Models was build using chosen machine learning algorithms for
a proposed dataset that included 450 university students and 20
features chosen for this purpose. The performance of the GBDT
algorithm is better as applied to the rest of the algorithms, with
an accuracy of 89.1%. EDM is a promising area for improving
the performance of educational institutions and using machine
learning algorithms. Schools and educational institutions can
use it in this area. This study proved that students' performance
could be predicted with acceptable accuracy by various selected
algorithms applied to the dataset.
REFERENCES
[1] D. Zeebaree, H. Haron, and A. Mohsin Abdulazeez, “Gene Selection and
Classification of Microarray Data Using Convolutional Neural Network,”
11/29, 2018.
[2] D. Delen, “A comparative analysis of machine learning techniques for
student retention management,” Decision Support Systems, vol. 49, pp.
498-506, 11/01, 2010.
[3] D. Q. Zeebaree, A. M. Abdulazeez, D. A. Zebari, H. Haron and H. Nuzly,
"Multi-level fusion in ultrasound for cancer detection based on uniform
lbp features," Computers, Materials & Continua, vol. 66, no.3, pp. 3363
3382, 2021.
[4] A. V. Manjarres, L. G. M. Sandoval, and M. J. S. Suárez, “Data mining
techniques applied in educational environments: Literature review,”
Digital Education Review, pp. 235-266, 06/01, 2018.
[5] J. Alzubi, A. Nayyar, and A. Kumar, “Machine Learning from Theory to
Algorithms: An Overview,” Journal of Physics: Conference Series, vol.
1142, pp. 012012, 11/01, 2018.
[6] A. Nájera, and J. de la Calleja, “Brief Review of Educational Applications
Using Data Mining and Machine Learning,” Revista Electrónica de
Investigación Educativa, vol. 19, pp. 84, 10/25, 2017.
[7] A. Mohsin Abdulazeez, D. Hajy, D. Zeebaree, and D. Zebari, “Robust
watermarking scheme based LWT and SVD using artificial bee colony
optimization,” Indonesian Journal of Electrical Engineering and
Computer Science, vol. 21, pp. 1218-1229, 02/01, 2021.
[8] M. Prada, M. Dominguez, J. Lopez Vicario, P. Alves, M. Barbu, M.
Podpora, U. Spagnolini, M. Pereira, and R. Vilanova, “Educational Data
Mining for Tutoring Support in Higher Education: A Web-Based Tool
Case Study in Engineering Degrees,” IEEE Access, vol. 8, pp. 212818-
212836, 01/01, 2020.
[9] Saleem, S., & Mohsin Abdulazeez, A. (2021). Hybrid trainable system for
writer identification of arabic handwriting. Comput. Mater. Contin..
[10] Hutner, T. L., Petrosino, A. J., & Salinas, C. (2019). Do preservice science
teachers develop goals reflective of science teacher education? A case
study of three preservice science teachers. Research in Science Education,
1-29.
[11] C. Dawson, "Educational data mining," pp. 114-119, 2019.
[12] V. Gudivada, V. Raghavan, V. Govindaraju, and C. R. Rao, Cognitive
Computing: Theory and Applications, 2016.
[13] A. Dutt, M. A. Ismail, and T. Herawan, “A Systematic Review on
Educational Data Mining,” IEEE Access, vol. 5, pp. 15991-16005, 01/17,
2017.
[14] O. Lu, A. Y. Q. Huang, J. C. H. Huang, A. J. Q. Lin, H. Ogata, and S.
Yang, “Applying learning analytics for the early prediction of students'
academic performance in blended learning,” Educational Technology and
Society, vol. 21, pp. 220-232, 01/01, 2018.
[15] J. Figueroa-Cañas, and T. Sancho-Vinuesa, “Early Prediction of Dropout
and Final Exam Performance in an Online Statistics Course,” IEEE
Revista Iberoamericana de Tecnologias del Aprendizaje, vol. PP, pp. 1-1,
04/13, 2020.
[16] M. Adnan, A. Habib, J. Ashraf, S. Mussadiq, A. Raza, M. Abid, M.
Bashir, and S. Khan, “Predicting at-Risk Students at Different
Percentages of Course Length for Early Intervention Using Machine
Learning Models,” IEEE Access, vol. PP, pp. 1-1, 01/05, 2021.
[17] M. Abuteir, and A. El-Halees, “Mining Educational Data to Improve
Students’ Performance: A Case Study,” International Journal of
Information and Communication Technology Research, vol. 2, pp. 140-
146, 01/01, 2012.
[18] F. Pereira, E. Teixeira de Oliveira, D. Fernandes, and A. Cristea, Early
performance prediction for CS1 course students using a combination of
machine learning and an evolutionary algorithm, 2019.
[19] R. Hasan, S. Palaniappan, S. Mahmood, A. Abbas, K. Sarker, and M.
Sattar, “Predicting Student Performance in Higher Educational
Institutions Using Video Learning Analytics and Data Mining
Techniques,” Applied Sciences, vol. 10, pp. 3894, 06/04, 2020.
[20] H. Turabieh, Hybrid Machine Learning Classifiers to Predict Student
Performance, 2019.
[21] Abdulazeez, A., Salim, B., Zeebaree, D., & Doghramachi, D. (2020).
Comparison of VPN Protocols at Network Layer Focusing on Wire Guard
Protocol.
[22] F. Ahmed, A. Mohsin Abdulazeez, V. Tiryaki, and D. Zeebaree,
“Management of Wireless Communication Systems Using Artificial
Intelligence-Based Software Defined Radio, International Journal of
Interactive Mobile Technologies (iJIM), vol. 14, pp. 107, 08/14, 2020.
[23] D. Zeebaree, H. Haron, A. Mohsin Abdulazeez, and D. Zebari, Machine
learning and Region Growing for Breast Cancer Segmentation, 2019.
[24] M. Santana, E. Costa, B. Fonseca, J. Rêgo, and F. Araújo, “Evaluating the
effectiveness of educational data mining techniques for early prediction
of students' academic failure in introductory programming courses,”
Computers in Human Behavior, vol. 73, 02/01, 2017.
[25] A. Alshanqiti, and A. Namoun, “Predicting Student Performance and Its
Influential Factors Using Hybrid Regression and Multi-Label
Classification,” IEEE Access, vol. 8, pp. 203827-203844, 01/01, 2020.
[26] Zebari, N. A., Zebari, D. A., Zeebaree, D. Q., & Saeed, J. N. (2021).
Significant features for steganography techniques using deoxyribonucleic
acid: A review. Indonesian Journal of Electrical Engineering and
Computer Science, 21(1), 338-347.
[27] R. Baker, and K. Yacef, “The State of Educational Data Mining in 2009:
A Review and Future Visions,” Journal of Educational Data Mining, vol.
1, pp. 3-17, 01/01, 2009.
[28] M. Koutina, and K. Kermanidis, Predicting Postgraduate Students’
Performance Using Machine Learning Techniques, 2011.
[29] C. Márquez, A. Cano, C. Romero, A. Mohammad, H. Fardoun, and S.
Ventura, “Early Dropout Prediction using Data Mining: A Case Study
with High School Students,” Expert Systems, vol. 33, pp. 107-124, 02/01,
2016.
[30] V. Kumar, A. Krishna, P. Neelakanteswara, and Z. Basha, Advanced
Prediction of Performance of a Student in an University using Machine
Learning Techniques, 2020.
[31] F. Aman, A. Rauf, R. Ali, F. Iqbal, and A. Khattak, A Predictive Model
for Predicting Students Academic Performance, 2019.
[32] A. Almasri, E. Celebi, and R. Alkhawaldeh, “EMT: Ensemble Meta-
Based Tree Model for Predicting Student Performance,” Scientific
Programming, vol. 2019, pp. 13, 02/24, 2019.
[33] S. Helal, J. Li, L. Liu, E. Ebrahimie, S. Dawson, D. Murray, and Q. Long,
“Predicting academic performance by considering student heterogeneity,
Knowledge-Based Systems, vol. 161, 07/01, 2018.
[34] A. Khalaf, A. Hashim, and W. Akeel, “Predicting Student Performance in
Higher Education Institutions Using Decision Tree Analysis,”
International Journal of Interactive Multimedia and Artificial Intelligence,
vol. 5, pp. 26-31, 09/01, 2018.
[35] H. Wei, H. Li, M. Xia, Y. Wang, and H. Qu, Predicting student
performance in interactive online question pools using mouse interaction
features, 2020.
[36] N. Bhargava, R. Purohit, S. Sharma, and A. Kumar, Prediction of arthritis
using classification and regression tree algorithm, 2017.
[37] H. a Hassan, “Predicting Students’ Performance Using Machine Learning
Techniques,” 04/01, 2019.
2021 IEEE International Conference on Automatic Control and Intelligent Systems (I2CACIS 2021), 26 June 2021, Shah Alam, Malaysia.
978-1-6654-0343-6/21/$31.00 ©2021 IEEE
280
Authorized licensed use limited to: UNIVERSITY TEKNOLOGI MALAYSIA. Downloaded on July 31,2021 at 11:16:06 UTC from IEEE Xplore. Restrictions apply.
[38] L. Mahmoud Abu Zohair, “Prediction of Student’s performance by
modelling small dataset size,” vol. 16, pp. 18, 12/01, 2019.
[39] A. Khalaf, and A. Humadi, “Student’s Success Prediction Model Based
on Artificial Neural Networks (ANN) and A Combination of Feature
Selection Methods,” Xinan Jiaotong Daxue Xuebao/Journal of Southwest
Jiaotong University, vol. 54, 06/01, 2019.
[40] F. Kaunang, and R. Rotikan, Students' Academic Performance Prediction
using Data Mining, 2018.
[41] M. Tsiakmaki, G. Kostopoulos, S. Kotsiantis, and O. Ragos, “Transfer
Learning from Deep Neural Networks for Predicting Student
Performance,” Applied Sciences, vol. 10, pp. 2145, 03/21, 2020.
[42] V. Bewick, L. Cheek, and J. Ball, “Statistics review 14: Logistic
regression,” Critical care (London, England), vol. 9, pp. 112-8, 03/01,
2005.
[43] K. Xu, M. Zhou, D. Yang, Y. Ling, K. Liu, T. Bai, Z. Cheng, and J. Li,
“Application of Ordinal Logistic Regression Analysis to Identify the
Determinants of Illness Severity of COVID-19 in China,” Epidemiology
and Infection, vol. 148, pp. 1-25, 07/07, 2020.
[44] Y. H. A. Falah, “A Survey and Analysis of Image Encryption Methods,”
International Journal of Applied Engineering Research, 2017, Research
India Publications.
[51] Zebari, D. A., Zeebaree, D. Q., Abdulazeez, A. M., Haron, H., & Hamed,
H. N. A. (2020). Improved Threshold Based and Trainable Fully
Automated Segmentation for Breast Cancer Boundary and Pectoral
Muscle in Mammogram Images. IEEE Access, 8, 203097-203116.
[52] G. Seni, and J. Elder, Ensemble Methods in Data Mining: Improving
Accuracy Through Combining Predictions, 2010.
[53] C. Zhang, Y. Zhang, X. Shi, G. Almpanidis, G. Fan, and X. Shen, “On
Incremental Learning for Gradient Boosting Decision Trees,” Neural
Processing Letters, vol. 50, 08/01, 2019.
[54] A. Hashim, W. Akeel, and A. Hamoud, “Student Performance Prediction
Model based on Supervised Machine Learning Algorithms,” IOP
Conference Series: Materials Science and Engineering, vol. 928, pp.
032019, 11/19, 2020.
[55] Zajmi, L., Ahmed, F. Y., & Jaharadak, A. A. (2018). Concepts, methods,
and performances of particle swarm optimization, backpropagation, and
neural networks. Applied Computational Intelligence and Soft
Computing, 2018.
[56] Ahmed, F. Y., al Thiruchelvam, M., & Fong, S. L. (2019, June).
Improvement of Vehicle Management System (IVMS). In 2019 IEEE
International Conference on Automatic Control and Intelligent Systems
(I2CACIS) (pp. 44-49). IEEE.
[57] Saleh, A. H., Yousif, A. S., & Ahmed, F. Y. (2020, April). Information
Hiding for Text Files by Adopting the Genetic Algorithm and DNA
Coding. In 2020 IEEE 10th Symposium on Computer Applications &
Industrial Electronics (ISCAIE) (pp. 220-223). IEEE.
[58] Ahmed, F. Y., Sreejith, R., & Abdullah, M. I. (2021, April). Enhancement
of E-Commerce Database System During the COVID-19 Pandemic.
In 2021 IEEE 11th IEEE Symposium on Computer Applications &
Industrial Electronics (ISCAIE) (pp. 174-179). IEEE.
[59] Alkawaz, M. H., Segar, S. D., & Ali, I. R. (2020). A Research on the
Perception and use of Electronic Books Among it Students in
Management & Science University. In 2020 16th IEEE International
Colloquium on Signal Processing & Its Applications (CSPA) (pp. 52-56).
IEEE.
[60] Alkawaz, M. H., Rajandran, H., & Abdullah, M. I. (2020). The Impact of
Current Relation between Facebook Utilization and E-Stalking towards
Users Privacy. In 2020 IEEE International Conference on Automatic
Control and Intelligent Systems (I2CACIS) (pp. 141-147). IEEE.
2021 IEEE International Conference on Automatic Control and Intelligent Systems (I2CACIS 2021), 26 June 2021, Shah Alam, Malaysia.
978-1-6654-0343-6/21/$31.00 ©2021 IEEE
281
Authorized licensed use limited to: UNIVERSITY TEKNOLOGI MALAYSIA. Downloaded on July 31,2021 at 11:16:06 UTC from IEEE Xplore. Restrictions apply.
Article
Full-text available
Students learning performance is one of the core components for assessing any educational systems. Students performance is very crucial in tackling issues of learning process and one of the important matters to measure learning outcomes. The ability to use data knowledge to improve education systems has led to the development of the field of research known as educational data mining (EDM). EDM is the creation of techniques to investigate data gathered from educational settings, allowing for a more thorough and accurate understanding of students and the improvement of educational outcomes for them. The use of machine learning (ML) technology has increased significantly in recent years. Researchers and teachers can use the measurements of success, failure, dropout, and more provided by the discipline of data mining in education to predict and simulate education processes. Therefore, this work presents an analysis of students performance using data mining methods. The paper presents both clustering and classification techniques to identify the impact of students performance at early stage with on the GPA. For the clustering technique, the paper uses dimensionality reduction mechanism by T-SNE algorithm with various factors at early stage such as admission scores and first level courses, academic achievement tests (AAT) and general aptitude tests (GAT) in order to explore the relationship between these factors and GPA’s. For the classification technique, the paper presents experiments on different machine learning models on predicting student performance at early stages using different features including courses’ grades and admission tests’ scores. We use different assessment metrics to evaluate the quality of the models. The results suggest that educational systems can mitigate the risks of students failures at the early stages.
Article
Full-text available
Introduction As artificial intelligence (AI) technology becomes more widespread in the classroom environment, educators have relied on data-driven machine learning (ML) techniques and statistical frameworks to derive insights into student performance patterns. Bayesian methodologies have emerged as a more intuitive approach to frequentist methods of inference since they link prior assumptions and data together to provide a quantitative distribution of final model parameter estimates. Despite their alignment with four recent ML assessment criteria developed in the educational literature, Bayesian methodologies have received considerably less attention by academic stakeholders prompting the need to empirically discern how these techniques can be used to provide actionable insights into student performance. Methods To identify the factors most indicative of student retention and attrition, we apply a Bayesian framework to comparatively examine the differential impact that the amalgamation of traditional and AI-driven predictors has on student performance in an undergraduate in-person science, technology, engineering, and mathematics (STEM) course. Results Interaction with the course learning management system (LMS) and performance on diagnostic concept inventory (CI) assessments provided the greatest insights into final course performance. Establishing informative prior values using historical classroom data did not always appreciably enhance model fit. Discussion We discuss how Bayesian methodologies are a more pragmatic and interpretable way of assessing student performance and are a promising tool for use in science education research and assessment.
Article
Full-text available
In this research study, we propose an Explainable Artificial Intelligence (XAI) model that provides the earliest possible global and local interpretation of students’ performance at various stages of course length. Global and local interpretation is provided in such a way that the prediction accuracy of a single local observation is close to the model’s overall prediction accuracy. For the earliest possible understanding of student performance, local and global interpretation is provided at 20%, 40%, 60%, 80%, and 100% of course length. Machine Learning (ML) and Deep Learning (DL) which are subfields of Artificial Intelligence (AI) have recently emerged to assist all educational institution’s in predicting the performance, engagement, and dropout rate of online students. Unfortunately, traditional ML and DL techniques lack in providing data analysis results in an understandable human way. Explainable AI (XAI), a new branch of AI, can be used in educational settings, specifically in VLEs, to provide the instructor with the study performance results of thousands or even millions of online students in a human-understandable way. Thus, unlike black box approaches such as traditional ML and DL techniques, XAI can help instructors to interpret the strengths and weaknesses of an individual student, providing them with timely personalized feedback and guidance. Various traditional and various ensemble ML algorithms were trained on demographic, clickstream, and assessment features to determine which algorithm gives the best performance result. The best-performing ML algorithm was ultimately selected and provided to the XAI model as an input for local and global interpretation of students’ study behavior at various percentages of course length. We have used various XAI tools to give students’ performance reports to instructors, in an explicable human way, at different stages of course length. The intermediate data analysis and performance reports will help instructors and all key stakeholders in decision-making and optimally facilitate online students.
Article
Full-text available
Online learning platforms such as Massive Open Online Course (MOOC), Virtual Learning Environments (VLEs), and Learning Management Systems (LMS) facilitate thousands or even millions of students to learn according to their interests without spatial and temporal constraints. Besides many advantages, online learning platforms face several challenges such as students’ lack of interest, high dropouts, low engagement, students’ self-regulated behavior, and compelling students to take responsibility for settings their own goals. In this study, we propose a predictive model that analyzes the problems faced by at-risk students, subsequently, facilitating instructors for timely intervention to persuade students to increase their study engagements and improve their study performance. The predictive model is trained and tested using various machine learning (ML) and deep learning (DL) algorithms to characterize the learning behavior of students according to their study variables. The performance of various ML algorithms is compared by using accuracy, precision, support, and f-score. The ML algorithm that gives the best result in terms of accuracy, precision, recall, support, and f-score metric is ultimately selected for creating the predictive model at different percentages of course length. The predictive model can help instructors in identifying at-risk students early in the course for timely intervention thus avoiding student dropouts. Our results showed that students’ assessment scores, engagement intensity i.e. clickstream data, and time-dependent variables are important factors in online learning. The experimental results revealed that the predictive model trained using Random Forest (RF) gives the best results with averaged precision = 0.60%, 0.79%, 0.84%, 0.88%, 0.90%, 0.92%, averaged recall = 0.59%, 0.79%, 0.84%, 0.88%, 0.90%, 0.91%, averaged F-score = 0.59%, 0.79%, 0.84%, 0.88%, 0.90%, 0.91%, and average accuracy = 0.59%, 0.79%, 0.84%, 0.88%, 0.90%, 0.91% at 0%, 20%, 40%, 60%, 80% and 100% of course length.
Article
Full-text available
Collective improvement in the acceptable or desirable accuracy level of breast cancer image-related pattern recognition using various schemes remains challenging. Despite the combination of multiple schemes to achieve superior ultrasound image pattern recognition by reducing the speckle noise, an enhanced technique is not achieved. The purpose of this study is to introduce a features-based fusion scheme based on enhancement uniform-Local Binary Pattern (LBP) and filtered noise reduction. To surmount the above limitations and achieve the aim of the study, a new descriptor that enhances the LBP features based on the new threshold has been proposed. This paper proposes a multi-level fusion scheme for the auto-classification of the static ultrasound images of breast cancer, which was attained in two stages. First, several images were generated from a single image using the pre-processing method. The median and Wiener filters were utilized to lessen the speckle noise and enhance the ultrasound image texture. This strategy allowed the extraction of a powerful feature by reducing the overlap between the benign and malignant image classes. Second, the fusion mechanism allowed the production of diverse features from different filtered images. The feasibility of using the LBP-based texture feature to categorize the ultrasound images was demonstrated. The effectiveness of the proposed scheme is tested on 250 ultrasound images comprising 100 and 150 benign and malignant images, respectively. The proposed method achieved very high accuracy (98%), sensitivity (98%), and specificity (99%). As a result, the fusion process that can help achieve a powerful decision based on different features produced from different filtered images improved the results of the new descriptor of LBP features in terms of accuracy, sensitivity, and specificity.
Article
Full-text available
This paper presents a web-based software tool for tutoring support of engineering students without any need of data scientist background for usage. This tool is focused on the analysis of students’ performance, in terms of the observable scores and of the completion of their studies. For that purpose, it uses a data set that only contains features typically gathered by university administrations about the students, degrees and subjects. The web-based tool provides access to results from different analyses. Clustering and visualization in a low-dimensional representation of students’ data help an analyst to discover patterns. The coordinated visualization of aggregated students’ performance into histograms, which are automatically updated subject to custom filters set interactively by an analyst, can be used to facilitate the validation of hypotheses about a set of students. Classification of students already graduated over three performance levels using exploratory variables and early performance information is used to understand the degree of course-dependency of students’ behavior at different degrees. The analysis of the impact of the student’s explanatory variables and early performance in the graduation probability can lead to a better understanding of the causes of dropout. Preliminary experiments on data of the engineering students from the 6 institutions associated to this project were used to define the final implementation of the web-based tool. Preliminary results for classification and drop-out were acceptable since accuracies were higher than 90% in some cases. The usefulness of the tool is discussed with respect to the stated goals, showing its potential for the support of early profiling of students. Real data from engineering degrees of EU Higher Education institutions show the potential of the tool for managing high education and validate its applicability on real scenarios.
Article
Full-text available
Understanding, modeling, and predicting student performance in higher education poses significant challenges concerning the design of accurate and robust diagnostic models. While numerous studies attempted to develop intelligent classifiers for anticipating student achievement, they overlooked the importance of identifying the key factors that lead to the achieved performance. Such identification is essential to empower program leaders to recognize the strengths and weaknesses of their academic programs, and thereby take the necessary corrective interventions to ameliorate student achievements. To this end, our paper contributes, firstly, a hybrid regression model that optimizes the prediction accuracy of student academic performance, measured as future grades in different courses, and, secondly, an optimized multi-label classifier that predicts the qualitative values for the influence of various factors associated with the obtained student performance. The prediction of student performance is produced by combining three dynamically weighted techniques, namely collaborative filtering, fuzzy set rules, and Lasso linear regression. However, the multi-label prediction of the influential factors is generated using an optimized self-organizing map. We empirically investigate and demonstrate the effectiveness of our entire approach on seven publicly available and varying datasets. The experimental results show considerable improvements compared to single baseline models (e.g. linear regression, matrix factorization), demonstrating the practicality of the proposed approach in pinpointing multiple factors impacting student performance. As future works, this research emphasizes the need to predict the student attainment of learning outcomes.
Article
Full-text available
This paper proposes a watermarking method for grayscale images, in which lifting wavelet transform and singular value decomposition are exploited based on multi-objective artificial bee colony optimization to produce a robust watermarking method. Furthermore, for increasing security encryption of the watermark is done prior to the embedding operation. In the proposed scheme, the actual image is altered to four sub-band over three levels of lifting wavelet transform then the singular value of the watermark image is embedded to the singular value of LH sub-band of the transformed original image. In the embedding operation, multiple scaling factors are utilized on behalf of the single scaling element to get the maximum probable robustness without changing watermark lucidity. Multi-objective artificial bee colony optimization is utilized for the determination of the optimal values for multiple scaling components, which are examined against various types of attacks. For making the proposed scheme more secure, the watermark is encrypted chaotically by logistic chaotic encryption before embedding it to the host (original) image. The experimental results show excellent imperceptibility and good resiliency against a wide range of image processing attacks. Keywords: Encryption Image watermarking Lifting wavelet transform Multi-objective artificial bee colony optimization Multiple scaling factor SVD This is an open access article under the CC BY-SA license.
Article
Full-text available
Segmentation of the breast region and pectoral muscle are fundamental subsequent steps in the process of Computer-Aided Diagnosis (CAD) systems. Segmenting the breast region and pectoral muscle are considered a difficult task, particularly in mammogram images because of artefacts, homogeneity among the region of the breast and pectoral muscle, and low contrast along the region of breast boundary, the similarity between the texture of the Region of Interest (ROI), and the unwanted region and irregular ROI. This study aims to propose an improved threshold-based and trainable segmentation model to derive ROI. A hybrid segmentation approach for the boundary of the breast region and pectoral muscle in mammogram images was established based on thresholding and Machine Learning (ML) techniques. For breast boundary estimation, the region of the breast was highlighted by eliminating bands of the wavelet transform. The initial breast boundary was determined through a new thresholding technique. Morphological operations and masking were employed to correct the overestimated boundary by deleting small objects. In the medical imaging field, significant progress to develop effective and accurate ML methods for the segmentation process. In the literature, the imperative role of ML methods in enabling effective and more accurate segmentation method has been highlighted. In this study, an ML technique was built based on the Histogram of Oriented Gradient (HOG) feature with neural network classifiers to determine the region of pectoral muscle and ROI. The proposed segmentation approach was tested by utilizing 322, 200, 100 mammogram images from mammographic image analysis society (mini-MIAS), INbreast, Breast Cancer Digital Repository (BCDR) databases, respectively. The experimental results were compared with manual segmentation based on different texture features. Moreover, evaluation and comparison for the boundary of the breast region and pectoral muscle segmentation have been done separately. The experimental results showed that the boundary of the breast region and the pectoral muscle segmentation approach obtained an accuracy of 98.13% and 98.41% (mini-MIAS), 100%, and 98.01% (INbreast), and 99.8% and 99.5% (BCDR), respectively. On average, the proposed study achieved 99.31% accuracy for the boundary of breast region segmentation and 98.64% accuracy for pectoral muscle segmentation. The overall ROI performance of the proposed method showed improving accuracy after improving the threshold technique for background segmentation and building an ML technique for pectoral muscle segmentation. More so, this paper also included the ground-truth as an evaluation of comprehensive similarity. In the clinic, this analysis may be provided as a valuable support for breast cancer identification. INDEX TERMS Breast cancer, Digital mammogram, Threshold technique, ML technique, Breast segmentation, Pectoral muscle segmentation. This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
Article
Full-text available
The key point of this paper is to assess and look over the top of the line network layer-based VPN (Virtual Private Network) protocols because data link layer is hardly ever found to be in use in organizations, the reason is because of its exceedingly high charge. VPN is commonly used in business situations to provide secure communication channels over public infrastructure such as the Internet. VPNs provide secure encrypted communication between remote networks worldwide by using Internet Protocol (IP) tunnels and a shared medium like the Internet. The paper follows and sets standards for different types of protocols and techniques. The VPN architectural feature is made to deliver a dependable and safe network that is not in line with regular networks that provide a higher trust and a higher secure channel between user and organization. The current study took place to summaries the usage of existing VPNs protocol and to show the strength of every VPN, through different studies that have been made by other researchers as well as an extra focus on the state of art protocol, Wire guard. It is also worthy of mentioning that the wire guard compared with other protocols such as IPsec and GRE. The studies show the Wire Guard being a better choice in terms of other well-known procedures to inaugurate a secure and trusted VPN.
Article
Full-text available
p> Information security and confidentiality are the prime concern of any type of communication. Rapidly evolution of technology recently, leads to increase the intruder’s ability and a main challenge to information security. Therefore, utilizing the non-traditional basics for information security is required, such as DNA which is focused as a new aspect to achieve better security. In this paper, a survey of more recent DNA based on data hiding algorithms are covered. With particular emphasis of different parameters several data hiding algorithms based on DNA has been reviewed. To present a more secure an efficient data hiding algorithms based on DNA for future works, this willbe helpful. </p
Article
Full-text available
The wireless communication system was investigated by novel methods, which produce an optimized data link, especially the software-based methods. Software-Defined Radio (SDR) is a common method for developing and implementing wireless communication protocols. In this paper, SDR and artificial intelligence (AI) are used to design a self-management communication system with variable node locations. Three affected parameters for the wireless signal are considered: channel frequency, bandwidth, and modulation type. On one hand, SDR collects and analyzes the signal components while on the other hand, AI processes the situation in real-time sequence after detecting unwanted data during the monitoring stage. The decision was integrated into the system by AI with respect to the instantaneous data read then passed to the communication nodes to take its correct location. The connectivity ratio and coverage area are optimized nearly double by the proposed method, which means the variable node location, according to the peak time, increases the attached subscriber by a while ratio