Conference PaperPDF Available

Evaluation of Random Forest and Support Vector Machine Models in Educational Data Mining

Authors:

Abstract and Figures

The computer science field has witnessed the popularity of machine learning (ML) in discriminating low achieving and high-achieving students. However, various ML methods have different performances in predicting student performance. Therefore, the investigative analysis of their effectiveness in the discrimination of student based on their academic achievement would have been the major research concern these days. This study investigates the performance of the random forest (RF) and support vector machine (SVM) against their power in academic performance prediction of a student grade score (SGS). The analysis is performed based on the classification capability of the two algorithms using the Portuguese SGS dataset. Furthermore, the study also focused on the analysis of the impact of sigmoid and radial basis functions on the capability of the SVM for classifying SGS. We also presented a comparison among the various ML methods namely RF, and SVM, in identifying the student performance based on the SGS. Various demographic information (age, sex) and student assessment results (assignment, mid-term exam, and quiz) were used as the features in training. The result revealed that RF and SVM classifiers have the power to predict student performance. The SVM scored more accuracy than the RF. We obtained high accuracy (75.72%) using the linear kernel. The result implied that SGS can be predicted by using previous assessment results with the proposed SVM classifier.
Content may be subject to copyright.
2024 2nd International Conference on Advancement in Computation & Computer Technologies (InCACCT)
979-8-3503-7131-4/24/$31.00 ©2024 IEEE
Evaluation of Random Forest and Support Vector
Machine Models in Educational Data Mining
Tsehay Admassu Assegie
School of Electronics Engineering,
Kyungpook National University,
Daegu, Republic of Korea
tsehayadmassu2006@gmail.com
Ayodeji Olalekan Salau
Department of Electrical and Computer
Engineering,
Afe Babalola University,
Ado-Ekiti, Nigeria
ayodejisalau98@gmail.com
Gunjan Chhabra
Department of CSE, Graphic Era Hill
University,
Dehradun, Uttarakhand, India,
chhgunjan@gmail.com
Keshav Kaushik
School of Computer Science, University of
Petroleum and Energy Studies,
Dehradun, Uttarakhand, India
officialkeshavkaushik@gmail.com
Sepiribo Lucky Braide
Department of Electrical and Electronics
Engineering, Rivers State University,
Port Harcourt 5080, Nigeria
braidesepiribo@yahoo.com
AbstractThe computer science field has witnessed the
popularity of machine learning (ML) in discriminating low
achieving and high-achieving students. However, various ML
methods have different performances in predicting student
performance. Therefore, the investigative analysis of their
effectiveness in the discrimination of student based on their
academic achievement would have been the major research
concern these days. This study investigates the performance of
the random forest (RF) and support vector machine (SVM)
against their power in academic performance prediction of a
student grade score (SGS). The analysis is performed based on
the classification capability of the two algorithms using the
Portuguese SGS dataset. Furthermore, the study also focused on
the analysis of the impact of sigmoid and radial basis functions
on the capability of the SVM for classifying SGS. We also
presented a comparison among the various ML methods namely
RF, and SVM, in identifying the student performance based on
the SGS. Various demographic information (age, sex) and
student assessment results (assignment, mid-term exam, and
quiz) were used as the features in training. The result revealed
that RF and SVM classifiers have the power to predict student
performance. The SVM scored more accuracy than the RF. We
obtained high accuracy (75.72% ) using the linear kernel. The
result implied that SGS can be predicted by using previous
assessment results with the proposed SVM classifier.
Keywordsdata mining, quality of education, education
student performance, classification
I. INTRODUCTION
The past few years have experienced extensive research in
the analysis and classification of student academic
achievement (SAA). The applicability of the ML system has
become significant in predicting SAA and providing early
information for low-achieving students [1]. Additionally, the
availability of a large volume of data in the educational
landscape has paved the way for the ML capability to
disseminate the low and high achievers possibly aiding the
analysis and extraction of knowledge about the factors
influencing SAA.
Hence, because of their highly accurate discriminative
capability, the use of ML systems has become prominent in
the classification of high and love achieving students.
Moreover, these systems also aid in the investigation of the
influential factors on SAA. Thus, the evaluation of various
ML algorithms become one of the most important research
topics in machine learning. In [2], the researchers investigated
the effectiveness of SVM, K-Neighbors, Naïve Bayes (NB)
Artificial Neural Network (ANN), and decision tree (DT) for
the classification of the SAA. The study highlighted that the
ANN model has higher performance compared to the other
model.
The application of ML gained much research attention in
improving the SAA at the higher institution. Authors in [3]
applied machine-learning methods to predict student dropout
at a higher education institution. The study analyzed the
accuracy of RF, SVM, DT, and ANN on student dropout
prediction. The result appears to prove that the RF classifier
outperforms the RF, SVM, and DT. The discriminative power
of the low and high achievers using the RF is 70.98% accuracy
on the test data used in the research.
Lau et al. [4] employed ANN to develop a discriminatory
system of high and low-achieving students. The study
showcased that the ANN has been one of the dominant SAA
assessment techniques. The discrimination of low and high-
performing students with ANN helps the teachers to
differentiate low achievers before their failure by delivering
compositing and other supportive sessions to aid them in
improving their academic achievements. The experiment
revealed that ANN has an accuracy of 84.8% on SAA
classification.
Similarly, a research study [5] investigated a literature
survey on the application of ML in improving the
discrimination of high and low-achieving students by
analyzing their score quality. The researchers also highlighted
in their findings that ML has a wide range of applications in
the educational sector for the discrimination of the SAA [6].
The study highlighted that the ensemble learning methods are
one of the most commonly applied machine learning methods
for predicting student's academic performance.
SVM has also been used for developing an ML system to
identify the SAA by analyzing student performance data. It is
used for the classification of student grade scores as pass or
fail based on certain previous records [7]. I.K. Nti et al. [8]
supported the claim that ML systems have the power to
discriminate students based on performance by comparing
SVM with linear regression. The paper suggested that the
result showed that SVM had a lower mean square error in
classifying those students likely to drop out.
Another study in [9] implemented a DT-based predictive
model for SAA. The study [10] employed students’ previous
131
2024 2nd International Conference on Advancement in Computation & Computer Technologies (InCACCT) | 979-8-3503-7131-4/24/$31.00 ©2024 IEEE | DOI: 10.1109/InCACCT61598.2024.10551110
Authorized licensed use limited to: University of Petroleum & Energy Studies. Downloaded on June 12,2024 at 03:14:04 UTC from IEEE Xplore. Restrictions apply.
grades to predict the grade score of the student in future
subjects. The experimental result demonstrates the
implemented DT model achieves 63.63% accuracy on SAA
prediction. The study suggested that the implemented model
is helpful to the administration in evaluating and assessing the
results of students in their decision-making.
Higher education institutions have their standards for
evaluating student success [11]. However, they lack proper
procedures for handling student data and analyzing the
academic achievement of their student [12]. Thus, machine-
learning approaches have significance in implementing
appropriate procedures for predicting student academic
achievement in higher education institutions [13]. Research
articles [14] developed DT and RF-based models for
predicting the SAA. The study compared the performance of
DT with RF [15]. The result of the comparison with accuracy
as a performance metric demonstrates that the RF model
achieves a higher accuracy of 69.9% on student academic
performance (SAA) prediction [16].
The comparative study of different machine learning
approaches for predicting SAA conducted in different studies
[17] suggests that supervised learning methods have become
significant in discovering correlations among different
attributes of the student. The application of machine learning
approaches to student data assists in enhancing the quality of
academic performance of higher institution students.
Associative classification has become one of the significant
tools for predicting SAA [18]. A comparative study [19] on
different supervised learning methods such as DT, RF, NB,
and deep learning shows promising results. The RF model
gives an accuracy of 75.52% in predicting the SAA [20].
The literature survey in [21] shows that machine-learning
approaches are widely applicable to the educational sector for
analyzing educational data to improve education quality.
Thus, this study aims to investigate the RF and SVM
models. Overall, the objectives of this study are discussed as
follows: (1) To study the performance of RF, and SVM for
predicting the performance of student success. (2) To study
the effect of the SVM regularization parameters on the
performance of SVM. To study the effect of the depth of the
tree on its discriminative power of discrimination the SAA
with the RF and SVM. The organization of the research is as
follows: we described the method and the source of data in
section 2 while section 3 focuses on the comparative result
analysis, and section 4 covers the conclusion. The research
focused on the investigation of the discriminative power of RF
and SVM SAA, using various performance indicators, such as
accuracy, and the parameters of SVM which include radial
basis, polynomial, and sigmoid functions.
II. METHODOLOGY
The study procedures followed in conducting this analysis
of the RF and SVM in the discrimination of the student against
their SGS involved the following steps as suggested by a study
[22]. The collection of the dataset is conducted in the first step.
Firstly, we gathered the SAA dataset, including their
demographic data, grade scores, attendance records, and
related variables that have an impact on the SGS as previously
used in [23]. In the second step, we conducted preprocessing
of the collected dataset by cleaning missing values, removing
redundant data samples, and removing outliers, and
categorical variables. The dataset is collected from Portuguese
SAA obtained from the Kaggle data repository employed by
the previous study [24]. Thirdly, the dataset has split the
dataset into training and testing to train and validate the SVM
and RF. Then in the fourth step, the RF, and SVM are trained
on the training set using varying hyperparameters to obtain
good discriminative power of the employed ML methods.
Various validation methods such as accuracy, and confusion
matrix are used in analyzing the effectiveness of the SVM and
RF on the test set, these measures help validate the predictive
power of the ML systems [25].
After obtaining the SAA the sex, gender, parent's
qualification, economic, and academic attributes are collected
from the Kaggle repository. To implement the selected ML
methods for discriminating the students with their SGS, SVM,
and RF we used Python 3.8 Programming Language using
Intel(R) Core (TM) i7-8550U CPU @ 1.80GHz 2.00 GHz
with 8GB RAM. We have removed redundant and missing
values from the collected dataset before training the SVM, and
RF. Additionally, we have label-encoded the categorical
features to feed the input data to the RF, and SVM. Figure 1
indicates the procedures we followed in implementing and
testing the RF and SVM-based predictors for SAA. The steps
involved in the process of conducting this study involved data
collection to validation as observed in Figure 1. Finally, the
model is validated against its prediction accuracy of SAA.
Fig. 1. The block diagram for the proposed system
III. RESULTS AND DISCUSSION
The validation of SVM and RF in discriminating the SAA
has produced good results across various types of research. A
research article [26] presented some key validation tests of the
ML systems in SAA classification [26]. The performance of
SVM and RF for the identification of the SAA is validated
based on prediction accuracy.
The comparative investigation of the RF and the SVM
(with sigmoid) has shown that the SVM classifier has better
discriminative power than the RF classifier in identifying the
SAA based on grade score. moreover, the results also
indicated that the SVM discriminative ability varies with the
variation of the parameters such as sigmoid, polynomial, and
radial basis functions. The higher discriminative power is
achieved by training the SVM with the sigmoid function as
compared with the other parameters. The performance of the
RF and SVM is presented in sections 3.1 and 3.2 respectively.
A. The Performance of the RF
RF is a popular machine learning algorithm that is widely
used in Educational Data Mining (EDM) due to its ability to
handle complex relationships in data, handle high-
dimensional feature spaces, and provide robust prediction [27].
132
Authorized licensed use limited to: University of Petroleum & Energy Studies. Downloaded on June 12,2024 at 03:14:04 UTC from IEEE Xplore. Restrictions apply.
It typically performs well in predicting student outcomes, such
as academic performance, dropout risk, or course completion.
Its ensemble nature, combining multiple decision trees, helps
reduce overfitting and improve prediction accuracy compared
to individual decision trees [28]. While RF is considered a
black-box model to some extent, it still offers some level of
interpretability through feature importance analysis [29].
Educators and researchers can gain insights into which
variables are most influential in predicting student outcomes
[30]. It can handle large datasets efficiently and is
parallelizable, making it suitable for processing vast amounts
of educational data. It can also handle imbalanced class
distributions, which are common in educational datasets [30].
Fig.2 demonstrates the performance of the proposed RF in
predicting the SAA. Fig. 2 indicates the random forest
achieves the highest accuracy of 84.02%, a minimum
accuracy of 59.72%, and an average accuracy of 71.84% in
predicting the performance of SAA. Thus, the model is
effective in predicting students learning outcomes even
though the model has scope for improvement, as an accuracy
score of 84.02% does not accurately predict the learning
outcomes of a student.
Fig. 2. The performance of the RF
B. The Performance of the SVM
The performance of the SVM model is analyzed on
different parameters. The SVM model achieves different
accuracies for different parameters such as the sigmoid,
polynomial, and radial basis function (RBF). The sigmoid
SVM model achieves a higher accuracy of 83.33% as
compared to the polynomial and RBF parameters. Figure 3
demonstrates the maximum, minimum, and average
accuracies of the SVM model using the sigmoid, polynomial,
and RBF for predicting SAA.
Fig. 3. The performance of SVM
C. Comparison of the RF and SVM
Table I indicates the accuracy achieved by the RF and
SVM models in predicting student academic performance. As
indicated in Table I, the SVM model achieves higher average
accuracy. The RF model achieves the highest accuracy but the
average accuracy of the RF model is lower than the SVM
model for predicting the learning outcomes of students.
Overall, the SVM model achieved 75.72% while the RF model
achieved 71.81% accuracy.
In terms of accuracy, the SVM has been shown to achieve
high prediction accuracy in tasks such as predicting student
academic performance, dropout, and learning styles. The
experiment appears to prove that SVM has a better capability
of handling noisy and highly correlated data. Handling of
noisy and higher correlation makes SVM a good choice for a
dataset with high dimensions such as the SAA. Furthermore,
the SVM has also shown high accuracy in discriminating
student outcomes, particularly with the kernel parameter
showing relationships between SAA and input features and
complex discrimination boundaries between good achievers
and those with lower scores. The linear kernel allowed SVM
to capture inherent patterns in the SAA data.
TABLE I. COMPARISON OF THE PERFORMANCE OF RF AND SVM.
Algorithm
Minimum accuracy
Maximum
accuracy
Average
accuracy
RF
59.72%
84.02%
71.81%
SVM
65.97%
84.72%
75.72%
The experimental result also showed that RF was found to
be robust against overfitting and making and it is hence a good
choice for predictive analysis tasks with high dimensionality
or noisy data [31]. Moreover, as an ensemble method, it also
combines various decision trees, which reduces variance and
improves the prediction of SAA.
Similarly, SVM is good for datasets with outliers as it can
handle high-dimensional data. The main maximization of the
objective in the SVM is to achieve good predictive accuracy
by finding the optimal decision boundary, leading to a good
discriminative capability. It is evident that both SVM and RF
have their strengths and weaknesses, and the selection of these
classifiers should be based on the type of the dataset being
used the task to be addressed, and the characteristics of the
dataset used in classification. However, further research is
needed to validate the potential of other ML classifiers for
SAA prediction. Fig. 4 indicates the comparison of the
performance of SVM and RF.
133
Authorized licensed use limited to: University of Petroleum & Energy Studies. Downloaded on June 12,2024 at 03:14:04 UTC from IEEE Xplore. Restrictions apply.
Fig. 4. The accuracy of RF and SVM for SAA
In conclusion, RF and SVM have been shown to have a
good predictive ability of the SAA, with each classifier
offering unique strengths in terms of prediction capability,
robustness, explainability, and computational complexity. The
choice between these algorithms would depend on the specific
characteristics of the dataset, the types of data of the value to
be predicted, and the priorities of the researcher or investigator.
By considering the trade-offs between these factors,
researchers can select a good-performing ML system for their
EDM application.
By following the procedures presented in section II, we
can effectively validate SVM, and RF in predicting the SAA
and assist in informed results on selecting the important ML
classifier for an SAA prediction. However, the methods
largely would depend on the specific characteristics of the
SAA as this study only validated their performance based on
SAA, the nature of the classification problem, and the trade-
offs between accuracy, transparency, and computational
complexity. Researchers and investigators in EDM can benefit
by considering the strengths and limitations of SVM, and RF
when choosing the better-performing classifier for their
specific needs.
IV. CONCLUSION
This paper proposed an RF and SVM-based model for the
identification of SAA. The study employed various
approaches of data pre-processing to improve the
discrimination ability of the proposed classifiers for
discriminating student-learning outcomes. Moreover, the
study investigated the parameters of SVM, and how the
performance of the SVM model is affected by the parameters
used during the training. The result demonstrates that the
linear SVM model performs better than the RF model
achieving an overall prediction accuracy of 75.72%. in
conclusion, the result of the experiment shows that supervised
learning methods such as RF, and SVM significantly assist in
improving the education quality at higher education
institutions providing higher predictive power. Overall, both
RF and SVM have shown promising results in predicting
students' CGPA. RF tends to perform well in handling noisy
data and large datasets, while SVM is effective in high-
dimensional spaces and non-linear data. However, the choice
between the two algorithms ultimately depends on the specific
characteristics of the dataset and the goals of the prediction
task.
The results show that both RF and SVM are effective in
predicting student performance (predicting student cumulative
grade point average), but RF performs RF as an ensemble
method that combines multiple decision trees, which reduces
the risk of overfitting and improves the generalization of the
model. SVM, on the other hand, is a powerful algorithm for
classification tasks, but it may not perform as well as Random
Forest when dealing with large datasets or noisy data. Overall,
the study concludes that the choice of algorithm should be
based on the specific problem being addressed and the
characteristics of the dataset. Further research is needed to
explore the potential of other machine-learning algorithms for
educational data mining.
REFERENCES
[1].
A. Almasri, E. Celebi, and R.S. Alkhawaldeh., “G. Nguyen et al.,
“Machine Learning and Deep Learning frameworks and libraries for
large-scale data mining: a survey,” Hindawi Scientific Programming,
vol. 2019, pp. 114, 2019, doi: https://doi.org/10.1155/2019/3610248.
[2].
Y.A. Alsariera, “Assessment and Evaluation of Different Machine
Learning Algorithms for Predicting Student Performance,” Hindawi
Computational Intelligence and Neuroscience, vol. 2022, no. 1, pp. 1
11, 2022, doi: s https://doi.org/10.1155/2022/4151487.
[3].
K. Dake, and C. Buabeng-Andoh, “Using Machine Learning
Techniques to Predict Learner Drop-out Rate in Higher Educational
Institutions,” Hindawi Mobile Information Systems, vol. 2022, no. 1,
2022, doi: https://doi.org/10.1155/2022/2670562.
[4].
E.T. Lau, L. Sun, and Q. Yang, “Modelling, prediction and
classification of student academic performance using artificial neural
networks,” SN Computer Science, 2019, doi: |
https://doi.org/10.1007/s42452-019-0884-7.
[5].
P. Balaji et al., “Contributions of Machine Learning Models towards
Student Academic Performance Prediction: A Systematic Review,”
Applied Science 2021 doi: https://doi.org/10.3390/app112110007.
[6].
R. Hasan et al., “Student Academic Performance Prediction by Using
Decision Tree Algorithm,” IEEE, 2018.
[7].
M. Kamal et al., “Metaheuristics Method for Classification and
Prediction of Student Performance Using Machine Learning
Predictors,” Hindawi Mathematical Problems in Engineering, vol.
2022, pp. 15, 2022, doi: https://doi.org/10.1155/2022/2581951.
[8].
I.K. Nti et al., “An empirical assessment of different kernel functions
on the performance of support vector machines,” Bulletin of Electrical
Engineering and Informatics, vol. 10, no. 6, pp. 3403-3411, 2021, doi:
10.11591/eei. v10i6.3046.
[9].
M. Zaffar, and K.S. Savita, “A Study of Feature Selection Algorithms
for Predicting Students Academic Performance,” International Journal
of Advanced Computer Science and Applications, vol. 9, no. 5, pp.
541–549, 2019. [10] H. Gull et al., “Improving Learning Experience of
Students by Early Prediction of Student Performance using Machine
Learning,” IEEE, pp. 1-4, 2019.
[10].
F.J Kaunang et al., “Students’ Academic Performance Prediction using
Data Mining,” IEEE, 2019. [12] C.C Kiu et al., “Data Mining Analysis
on Student's Academic Performance through Exploration of Student's
Background and Social Activities,” IEEE, 2018, doi:
10.1109/ICACCAF.2018.8776809.
[11].
S. Biju, A.O. Salau, J.N. Eneh, V. E. Sochima, I. T. Ozue, “A Novel
Pre-Class Learning Content Approach for the Implementation of
Flipped Classrooms,” International Journal of Advanced Computer
Science and Applications (IJACSA), Vol. 11(7), pp. 131-136,
2020. DOI: 10.14569/IJACSA.2020.0110718
[12].
J. Sadowski, “Predicting Student Academic Performance in Computer
Science Courses: A Comparison of Neural Network Models,”
International Journal of Modern Education and Computer Science, vol.
1, no. 1, pp. 19, 2018, doi: 10.5815/ijmecs.2018.06.01
[13].
H. Karalar, C. Kapucu, and H. Gürüler, “Predicting students at risk of
academic failure using ensemble model during the pandemic in a
distance learning system,” International Journal of Education in Higher
Education, vol. 18, no. 63, pp. 118, 2021, doi
https://doi.org/10.1186/s41239-021-00300-y.
[14].
E. Alyahyan, and Dilek Düştegör, “Predicting academic success in
higher education: literature review and best practices,” International
Journal of Education in Higher Education, vol. 17, no. 3, pp. 121,
2020, doi: https://doi.org/10.1186/s41239-020- 0177-7.
[15].
D.S Maylawati et al., “Data science for digital culture improvement in
higher education using K-means clustering and text analytics,”
International Journal of Electrical and Computer Engineering., vol. 10,
no. 5, 2020, pp. 4569-4580, doi: 10.11591/ijece. v10i5.pp4569-4580.
134
Authorized licensed use limited to: University of Petroleum & Energy Studies. Downloaded on June 12,2024 at 03:14:04 UTC from IEEE Xplore. Restrictions apply.
[16].
A.S. Hashim, W.D. Awadh, and A.K. Hamoud, “Student Performance
Prediction Model based on Supervised Machine Learning Algorithms,”
Materials Science and Engineering, 2020, doi:10.1088/1757-
899X/928/3/032019.
[17].
L. Cagliero et al., “Predicting Student Academic Performance using
Associative Classification,” Applied Sciences., pp. 1–22, 2021, doi:
https://doi.org/10.3390/app11041420.
[18].
N.A. Yassein, R.G.Helali, and S.B. Mohomad, “Predicting Student
Academic Performance in KSA using Data Mining Techniques”
Journal of Information Technology & Software Engineering, vol. 7, no.
5, pp. 15, 2017, doi: 10.4172/2165-7866.1000213
[19].
L. Nanglae, “Determining patterns of student graduation using a bi-
level learning framework,” Bulletin of Electrical Engineering and
Informatics, vol. 10, no. 4, 2021, pp. 2201-2211, doi: 10.11591/eei.
v10i4.2502.
[20].
S.A. Alwarthan, N. Aslam, and I.U. Khan, "Predicting Student
Academic Performance at Higher Education Using Data Mining: A
Systematic Review," Hindawi Applied Computational Intelligence and
Soft Computing, 2022, doi: https://doi.org/10.1155/2022/8924028.
[21].
S. Leonelli and N. Tempini, “Predicting Student Performance to
Improve Academic Advising Using the Random Forest Algorithm,"
International Journal of Distance Education Technologies. vol. 20, no.
1, pp. 1-17, doi: https://orcid.org/0000-0001-8440-5889.
[22].
S. Huang, and J. Wei, "Student Performance Prediction in Mathematics
Course Based on the Random Forest and Simulated Annealing,"
Hindawi Scientific Programming, 2022, doi:
https://doi.org/10.1155/2022/9340434.
[23].
D.T. Ha et al., “An Empirical Study for Student Academic Performance
Prediction Using Machine Learning Techniques,” International Journal
of Computer Science and Information Security, vol. 18, no. 3, 2020.
[24].
A. Asselman et al., “Enhancing the prediction of student performance
based on the machine learning XGBoost algorithm,” Interactive
Learning Environments, vol. 31, no. 6, 2023.
[25].
A. Triayudi, and I Fitri, “Comparison of the feature selection algorithm
in educational data mining,” TELKOMNIKA Telecommunication,
Computing, Electronics and Control vol. 19, No. 6, December 2021,
pp. 1865~1871, DOI: 10.12928/TELKOMNIKA.v19i6.21594.
[26].
S.T. Ahmed, R. Al-Hamdani, and M.S. Croock, “Enhancement of
student performance prediction using modified K-nearest neighbor,”
TELKOMNIKA Telecommunication, Computing, Electronics and
Control Vol. 18, No. 4, August 2020, pp. 1777-1783, DOI:
10.12928/TELKOMNIKA.v18i4.13849.
[27].
M. Yağcı, “Educational data mining: prediction of students' academic
performance using machine learning algorithms,” Smart Learn.
Environ. 9, 11 (2022). https://doi.org/10.1186/s40561-022-00192-z
[28].
W. Xing, R. Guo, E. Petakovic & S. Goggins, Participation-based
student final performance prediction model through interpretable
Genetic Programming: Integrating learning analytics, educational data
mining and theory. Computers in Human Behavior, 47, pp. 168181,
2015.
[29].
P. Dabhade, R. Agarwal, K.P. Alameen, A.T. Fathima, R. Sridharan,
G. Gopakumar, “Educational data mining for predicting students’
academic performance using machine learning algorithms,” Materials
Today: Proceedings, 2021. doi: 10.1016/j.matpr.2021.05.646
[30].
P. Chaudhury, and H.K. Tripathy, “A novel academic performance
estimation model using two-stage feature selection,” Indonesian
Journal of Electrical Engineering and Computer Science, Vol. 19, No.
3, September 2020, pp. 1610-619, DOI: 10.11591/ijeecs. v19.i3. pp
1610-1619.
[31].
S. Mohamed, and A. Ezzati, “A data mining process using
classification techniques for employability prediction, and Future
Opportunities,” Indonesian Journal of Electrical Engineering and
Computer Science, vol. 14, no. 2, May 2019, pp. 1025-1029, DOI:
10.11591/ijeecs. v14.i2. pp1025-1029.
135
Authorized licensed use limited to: University of Petroleum & Energy Studies. Downloaded on June 12,2024 at 03:14:04 UTC from IEEE Xplore. Restrictions apply.
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Recently, students dropping out of school at the tertiary level without prior notice or permission has intrigued deep concern among academic authorities, instructors, and counsellors. It has therefore become necessary to understand factors that lead to high attrition rates among learners and identify at-risk students for urgent academic counselling. In providing a proactive response to learner attrition, the study deployed a machine learning algorithm with high model accuracy to predict students’ drop-out rates and identify dominant attributes that affect learner attrition and retention. An attrition model was built and validated among support vector machine, decision tree, multilayer perceptron, and random forest algorithms. The machine learning algorithms were tested for accuracy, precision, recall, F-measure, and ROC using the 10-fold and the 5-fold comparative cross-validation techniques. In addition to the cross-validation technique, the chi-square feature selection mechanism was implemented to understand the algorithms’ training time and accuracy. The random forest emerged as the best-performing algorithm, with an accuracy of 70.98% and 69.74% for the 10-fold and the 5-fold cross-validation implementations, respectively.
Article
Full-text available
Recently, educational institutions faced many challenges. One of these challenges is the huge amount of educational data that can be used to discover new insights that have a signi€cant contribution to students, teachers, and administrators. Nowadays, researchers from numerous domains are very interested in increasing the quality of learning in educational institutions in order to improve student success and learning outcomes. Several studies have been made to predict student achievement at various levels. Most of the previous studies were focused on predicting student performance at graduation time or at the level of a speci€c course. e main objective of this paper is to highlight the recently published studies for predicting student academic performance in higher education. Moreover, this study aims to identify the most commonly used techniques for predicting the student’s academic level. In addition, this study summarized the highest inˆuential features used for predicting the student academic performance where identifying the most inˆuential factors on student’s performance level will help the student as well as the policymakers and will give detailed insights into the problem. Finally, the results showed that the RF and ensemble model were the most accurate models as they outperformed other models in many previous studies. In addition, researchers in previous studies did not agree on whether the admission requirements have a strong relationship with students’ achievement or not, indicating the need to address this issue. Moreover, it has been noticed that there are few studies which predict the student academic performance using students’ data in arts and humanities major.
Article
Full-text available
Over the last few decades, there has been a gradual deterioration in higher education in all three areas: the academic setting (both staff and students), as well as research and development output (including graduates). All colleges and universities are essentially focused on improving management decision-making and educating pupils. High-quality higher education can be obtained through a variety of methods. One method is to accurately forecast pupils’ achievement in their chosen educational context. There are numerous prediction models from which to pick. While it is unclear whether there are any markers that can predict whether a kid will be an academic genius, a dropout, or an average performer, the researcher reports student achievement. This article presents a metaheuristics and machine learning-based method for the classification and prediction of student performance. Firstly, features are selected using a relief algorithm. Machine learning classifiers such as BPNN, RF, and NB are used to classify student academic performance data. BPNN is having better accuracy for classification and prediction of student academic performance.
Article
Full-text available
Student performance is crucial to the success of tertiary institutions. Especially, academic achievement is one of the metrics used in rating top-quality universities. Despite the large volume of educational data, accurately predicting student performance becomes more challenging. e main reason for this is the limited research in various machine learning (ML) approaches. Accordingly, educators need to explore e ective tools for modelling and assessing student performance while recognizing weaknesses to improve educational outcomes. e existing ML approaches and key features for predicting student performance were investigated in this work. Related studies published between 2015 and 2021 were identi ed through a systematic search of various online databases. irty-nine studies were selected and evaluated. e results showed that six ML models were mainly used: decision tree (DT), arti cial neural networks (ANNs), support vector machine (SVM), K-nearest neighbor (KNN), linear regression (LinR), and Naive Bayes (NB). Our results also indicated that ANN outperformed other models and had higher accuracy levels. Furthermore, academic, demographic, internal assessment, and family/personal attributes were the most predominant input variables (e.g., predictive features) used for predicting student performance. Our analysis revealed an increasing number of research in this domain and a broad range of ML algorithms applied. At the same time, the extant body of evidence suggested that ML can be bene cial in identifying and improving various academic performance areas.
Article
Full-text available
Educational data mining is becoming a more and more popular research field in recent years, mainly with the help of cross research conducted by various disciplines, so as to solve various difficult problems in the teaching and education process. In this paper, we proposed a hybrid approach for student performance prediction. We collected the dataset, including 15 characteristics of students from three categories (individual basic information, individual education information, and individual behavior information). Based on the random forest (RF) and simulated annealing (SA) algorithms, we binary encode the relevant parameters (number of features, tree size, and tree decision weights) as the target variables for algorithm optimization, use the out-of-bag error as the optimization objective function, and then propose the IRFC (improved random forest classifier) algorithm in this paper. Compared with other mainstream improved random forest algorithms, the research results demonstrate that the proposed algorithm in this paper has higher generalization ability and smaller OOB error. This study provides a methodological reference for the prediction of student achievement and also makes a marginal contribution to student management work.
Article
Full-text available
Educational data mining has become an effective tool for exploring the hidden relationships in educational data and predicting students' academic achievements. This study proposes a new model based on machine learning algorithms to predict the final exam grades of undergraduate students, taking their midterm exam grades as the source data. The performances of the random forests, nearest neighbour, support vector machines, logistic regression, Naïve Bayes, and k-nearest neighbour algorithms, which are among the machine learning algorithms, were calculated and compared to predict the final exam grades of the students. The dataset consisted of the academic achievement grades of 1854 students who took the Turkish Language-I course in a state University in Turkey during the fall semester of 2019–2020. The results show that the proposed model achieved a classification accuracy of 70–75%. The predictions were made using only three types of parameters; midterm exam grades, Department data and Faculty data. Such data-driven studies are very important in terms of establishing a learning analysis framework in higher education and contributing to the decision-making processes. Finally, this study presents a contribution to the early prediction of students at high risk of failure and determines the most effective machine learning methods.
Article
Full-text available
Predicting students at risk of academic failure is valuable for higher education institu- tions to improve student performance. During the pandemic, with the transition to compulsory distance learning in higher education, it has become even more impor- tant to identify these students and make instructional interventions to avoid leaving them behind. This goal can be achieved by new data mining techniques and machine learning methods. This study took both the synchronous and asynchronous activity characteristics of students into account to identify students at risk of academic failure during the pandemic. Additionally, this study proposes an optimal ensemble model predicting students at risk using a combination of relevant machine learning algo- rithms. Performances of over two thousand university students were predicted with an ensemble model in terms of gender, degree, number of downloaded lecture notes and course materials, total time spent in online sessions, number of attendances, and quiz score. Asynchronous learning activities were found more determinant than synchro- nous ones. The proposed ensemble model made a good prediction with a specificity of 90.34%. Thus, practitioners are suggested to monitor and organize training activities accordingly.
Article
Full-text available
Artificial intelligence (AI) and machine learning (ML) have influenced every part of our day-today activities in this era of technological advancement, making a living more comfortable on the earth. Among the several AI and ML algorithms, the support vector machine (SVM) has become one of the most generally used algorithms for data mining, prediction and other (AI and ML) activities in several domains. The SVM's performance is significantly centred on the kernel function (KF); nonetheless, there is no universal accepted ground for selecting an optimal KF for a specific domain. In this paper, we investigate empirically different KFs on the SVM performance in various fields. We illustrated the performance of the SVM based on different KF through extensive experimental results. Our empirical results show that no single KF is always suitable for achieving high accuracy and generalisation in all domains. However, the gaussian radial basis function (RBF) kernel is often the default choice. Also, if the KF parameters of the RBF and exponential RBF are optimised, they outperform the linear and sigmoid KF based SVM method in terms of accuracy. Besides, the linear KF is more suitable for the linearly separable dataset.
Article
Full-text available
Machine learning is emerging nowadays as an important tool for decision support in many areas of research. In the field of education, both educational organizations and students are the target beneficiaries. It facilitates the educational sector in predicting the student’s outcome at the end of their course and for the students in deciding to choose a suitable course for them based on their performances in previous exams and other behavioral features. In this study, a systematic literature review is performed to extract the algorithms and the features that have been used in the prediction studies. Based on the search criteria, 2700 articles were initially considered. Using specified inclusion and exclusion criteria, quality scores were provided, and up to 56 articles were filtered for further analysis. The utmost care was taken in studying the features utilized, database used, algorithms implemented, and the future directions as recommended by researchers. The features were classified as demographic, academic, and behavioral features, and finally, only 34 articles with these features were finalized, whose details of study are provided. Based on the results obtained from the systematic review, we conclude that the machine learning techniques have the ability to predict the students’ performance based on specified features as categorized and can be used by students as well as academic institutions. A specific machine learning model identification for the purpose of student academic performance prediction would not be feasible, since each paper taken for review involves different datasets and does not include benchmark datasets. However, the application of the machine learning techniques in educational mining is still limited, and a greater number of studies should be carried out in order to obtain well-formed and generalizable results. We provide future guidelines to practitioners and researchers based on the results obtained in this work.